The dominant document format in the seventeenth century together with folio was quarto throughout Europe. There is an unusual peak during the civil war era in ESTC caused by the Thomason Tracts [LINK: https://www.bl.uk/collection-guides/thomason-tracts **]. This means that because of the cataloguing rules of including different variants in ESTC, bookseller George Thomason was able to gather so many of these with respect to civil war pamphlets that there is a noticeable statistical peak because of them. This needs to be noted, but it does not change the overall general trend [FOOTNOTE: In our Helsinki Computational History Research Group we are now working on algorithms for edition level harmonization of the data in which the objective is to be able to analyse the documents based on first editions or particular editions. This will be very useful also for the text mining of different full text resources (such as ECCO when combined to ESTC). Our hypothesis is that it might make a crucial difference whether conceptual change based on vector space models is analysed relying on first editions alone or including also further editions and reprints in a large full text source such as ECCO.]. Quarto document was, as said earlier, the common document format for pamphlets and other shorter pieces. When we look at the HPBD (Fig. 1) we see that quarto's share is fairly constant throughout the early modern period. In ESTC, however, there is a declining curve since the second half of the seventeenth century. This is because of the even quicker increase of other formats, in ESTC quarto format does not decline in absolute numbers, but like all other book formats, it's absolute numbers are rising in the eighteenth century.It is also interesting to notice that there seems to be a correlation between the language of the document and the format in question. Comparing books published in English, Latin and other languages in London (Supplementary Fig. 3) suggests that especially duodecimo was the preferred format for books printed in other languages than English and Latin, whereas octavo was the one used proportionally more in Latin books than others. Especially the small share of folio documents in Latin is interesting. Also the quarto share of Latin in this respect in London is noteworthy.  
Vernacularization in Europe, 1500-1800
Vernacularization refers to a historical transformation in local language relation. Multilingual systems in which one language (in Europe often Latin) was reserved for learned communication whereas local vernacular languages were used in everyday communication started to erode and local languages gained increased prominence. They were made into vehicles for discussing politics, science and culture. This process happened at different speeds in different parts of Europe. Judging from today's teleological perspective vernaculars such as English and French gained prominence already in the 1600s, whereas for the German and Swedish languages this development happened in the eighteenth and nineteenth centuries. For many smaller languages in Europe such as Finnish or Czech this development happened in conjunction to nation building in the latter half of the nineteenth century. Ultimately, vernacularization is an open-ended process. For many potentially vernacularizeable languages the transformation never took place and a similar process could potentially take place also in the future as language relations are in a constant flux. The dominance  of English today in many parts of Europe, in a sense marks a reversed transformation. Linguists and historians have from various perspectives paid attention to vernacularization as a process [CITE: \cite{Ferguson_1959}\cite{Fishman_1967} **], but this article takes a novel approach by investigating metadata collections that contain thousands of titles and related bibliographic information and thus provide a previously unexplored source to trace how the process of vernacularization materialized in concrete publications. 
While language relations differ considerably all over Europe, there is one measure that paints a picture of vernacularization as a general trend in European publishing: the share of publications in Latin. All of our four metadata collections show an indisputable declining trend in the share of publications in Latin in the period 1500-1800, but there are noticeable differences to the timing and proportions of the transformation, which are partly explained by historical trajectories mirrored in the data but also by the composition of the data itself. The HPBD (Fig. 2) provides the geographically broadest overview of the decline of Latin as a language for printed materials in Europe, but as a data set it includes most gaps and uncertainties. Nevertheless, in HPBD the decline of Latin in the eighteenth century is most rapid and it happens later than for the ESTC and SNB (Fig. 2). This may be a result of the composition of the database with many metadata collections being predominantly focused on the eighteenth century. The earlier decline of Latin in Britain corresponds with our previous knowledge of the early establishment of English as the main language of high-level communication. Well-known symbols for using English such as Shakespeare and the Royal Society [CITE: \cite{STARK_2011}, 9–46;  \cite{dear1985};  \cite{Livesey2009} **] anticipate this, but once the comparison based on national bibliographies can be brought to a more reliable level, we can provide a statistically accurate picture of this. The available data suggests that the decline of Latin in Britain  is more drastic than it has been previously anticipated. 
The SNB and FNB allow us to zoom in and look at the Swedish case more closely and compare the different properties of the bibliographies. While the SNB portrays the general trend for the Swedish realm, it is also clear that Stockholm as a publication center dominates the image  (Fig. 3). Looking at the FNB, which consists mostly of publications from Turku (Åbo), one of the four university towns in the realm, shows that the distinct publication of profile of university towns are sometimes hidden under the national average. Still, also in Turku, we find a concrete decline in the share of Latin publications, but the decline was definitely later although the Academy in Turku has been described as one of the most utility-oriented universities in the Swedish realm and thus also most prone to use Swedish [CITE: \cite{lindberg1993} **].  One special feature with the FNB has to do with with the different roles of Swedish and Finnish as languages. While Swedish became a stronger candidate for academic publications, Finnish emerged as a written language especially in shorter religious and economic texts. Vernacularization was in this case not a process between two languages, but three. 
Keeping in mind the uncertainties relating to the HPBD, an inspection of university towns suggests that this is a wider trend. The university town, the capital, and the commercial centers had different linguistic publication profiles and vernacularization as a process happened in different phases. An analysis of languages used in publications from Cambridge, Oxford, Leiden, and Göttingen (Supplementary Fig. 2) shows how Latin lingered on, but also in these cases, like in Turku the local languages did gain a much more prominent position by the end of the eighteenth century. Compared to the absolute publishing centers in Europe, Paris and London, the development is really late. Interestingly, the metadata collections tell us about national trends, such as an early decline of Latin or competing vernaculars, but when viewed in comparison we can also see patterns that cross national  boundaries, such as different types of publishing milieus in towns commercial towns, university towns or capital cities. All of Europe had a cultural debt to sources from Antiquity, but this debt materialized differently in the places that were almost self-sufficient culturally like Paris and London or the university towns that embodied learning by attaching themselves to Latin traditions (CITE: for Latin-vernacular diglossia, see \cite{lindberg2006} **). 
Since both vernacularization and the rise of octavo seem to be inherently related to a modernization of public discourse, reading and writing, a final question is then if the change in the popularity of formats in the sixteenth and particularly eighteenth centuries is related to the shifts in language in the same period. It seems that there is no simple answer to this. Quite naturally, in all of the studied metadata collections, the vernacular languages obtain a growing share of published books in the octavo format (Supplementary Fig. 4). A closer look at cities with different publication profiles shows that the matter was more complicated. For the ESTC the share of octavo books is for most cities higher in Latin books than for English books (Supplementary Fig. 3). Also in the SNB, both Latin and Swedish books tend to navigate towards smaller formats at the end of the eighteenth century, but the HPBD's record for German cities point at the octavo format being used in German-language books being more often than for Latin books (Supplementary Fig. 3). While there is not a clear correlation between language and format, the analysis of format nonetheless helps qualify earlier research. Henrik Horstbøll  has shown that the octavo format was particularly popular in Denmark with small histories that stood for a leisurely reading (CITE: Horstbøll2009. ‘In octavo' **), but by looking at much bigger sample, it is clear that the octavo format became more popular in other genres as well, including books published in Latin in university towns. Additional data and content analysis will in the future allow to look more closely at how genre, language and format relate to one another. It is reasonable to assume that smaller formats and different languages relate to new genres and new types of reading, but the relationship between the two is not straightforward.

Discussion

This article has sought to demonstrate that something as seemingly trivial as document sizes and language of titles can have a crucial role when considering the emergence of public sphere in early modern Europe. The relationship between reading habits and broadly circulated written documents in the Enlightenment period can be looked at differently when we know the relevance of octavo sized book and the rise of local written languages in Europe during the eighteenth century. To get at these processes in a reliable way we have used tools of bibliographic data science. 
Our work is part of the emerging trend towards the utilization of large digital data resources in publishing history. [FOOTNOTE: For instance, the Culturomics project [\cite{Michel_2010}] analyzed broad historical trends in English language and culture in the period 1800-2000 based on a corpus collected from the full text content of over five million digitized books. On difficulties in interpreting the data, see also article commentary \cite{Morse_Gagne_2011} **]. Many of the problems relating to scalable data processing and interpretation were similar to the ones we have encountered in the context of bibliographic metadata collections. We have investigated four different types of bibliographic metadata collections (FNB, SNB, ESTC, and HPBD), providing a transparent, detailed, and replicable account of data harmonization and analysis so that details of the data processing can be independently investigated and verified.  As similar datasets national bibliographies are not only about mapping the national traditions of publishing, but can also be studied comparatively and ultimately be integrated across borders to overcome a national perspective in analyzing the past.
The power of a large-scale approach is that broad patterns in knowledge production are often overwhelmingly clear, despite occasional inaccuracies and collection biases in individual data sets. Already the HPBD can be used to assess some general trends in publishing history although it does not compete in reliability, coverage or level of harmonization with the other used bibliographic metadata collections. This is exemplified by our key observations on vernacularization and the rise of the octavo, which are supported by similar trends across multiple independently maintained bibliographic metadata collections. For a more detailed comparison across European cities, further harmonization and augmentation of the collections are needed.(FOOTNOTE: For instance, Buringh 2009, rely on an earlier version of the HPBD to assess very general trends in printing, but their methods for clean-up make it impossible to go further into detail.**) Integration of collections demands a further work in reliably detecting duplicates, different editions and translations cross catalogues. Our systematic approach provides a starting point, guidelines, and a set of practically tested algorithms for more extensive analysis and integration.
Our future work envisions to take a closer look at the development of public communication since we have extracted and harmonized the publisher information from imprints from ESTC and FNB and started analysing this information [CITE: \cite{tolonen2016} **]. Our vision includes the study of the newspaper as an early modern phenomenon and how newspapers materially develop over the years and how reporting becomes professionalized and how this is reflected in the material development of different types of documents (CITE: \cite{marjanen2017} ** ). We consider these aspects that previously might have seemed as mere material developments within the printing industry as crucial insights also to the emergence of public communication that transformed Europe in the eighteenth century. We hope that this short article has given a concrete idea how this larger objective can be accomplished.