It is also interesting to notice that there seems to be a correlation between the language of the document and the format in question. Comparing books published in English, Latin and other languages in London (Supplementary Fig. 3) suggests that especially duodecimo was the preferred format for books printed in other languages than English and Latin, whereas octavo was the one used proportionally more in Latin books than others. Especially the small share of folio documents in Latin is interesting. Also the quarto share of Latin in this respect in London is noteworthy.  
Vernacularization in Europe, 1500-1800
Vernacularization refers to a historical transformation in local language relation. Multilingual systems in which one language (in Europe often Latin) was reserved for learned communication whereas local vernacular languages were used in everyday communication started to erode and local languages gained increased prominence. They were made into vehicles for discussing politics, science and culture. This process happened at different speeds in different parts of Europe. Judging from today's teleological perspective vernaculars such as English and French gained prominence already in the 1600s, whereas for the German and Swedish languages this development happened in the eighteenth and nineteenth centuries. For many smaller languages in Europe such as Finnish or Czech this development happened in conjunction to nation building in the latter half of the nineteenth century. Ultimately, vernacularization is an open-ended process. For many potentially vernacularizeable languages the transformation never took place and a similar process could potentially take place also in the future as language relations are in a constant flux. The dominance  of English today in many parts of Europe, in a sense marks a reversed transformation. Linguists and historians have from various perspectives paid attention to vernacularization as a process [CITE: \cite{Ferguson_1959}; \cite{Fishman_1967} **], but this article takes a novel approach by investigating library catalogues that contain thousands of titles and related bibliographic information and thus provide a previously unexplored source to trace how the process of vernacularization materialized in concrete publications. 
While language relations differ considerably all over Europe, there is one measure that paints a picture of vernacularization as a general trend in European publishing: the share of publications in Latin. All of our four catalogues show an indisputable declining trend in the share of publications in Latin in the period 1500-1800, but there are noticeable differences to the timing and proportions of the transformation, which are partly explained by historical trajectories mirrored in the data but also by the composition of the data itself. The HPBD (Fig. 2) provides the geographically broadest overview of the decline of Latin as a language for printed materials in Europe, but as a data set it includes most gaps and uncertainties. Nevertheless, in HPBD the decline of Latin in the eighteenth century is most rapid and it happens later than for the ESTC and SNB (Fig. 2). This may be a result of the composition of the database with many catalogues being predominantly focused on the eighteenth century. The earlier decline of Latin in Britain corresponds with our previous knowledge of the early establishment of English as the main language of high-level communication. Well-known symbols for using English such as Shakespeare and the Royal Society [CITE: \cite{STARK_2011}, 9–46;  \cite{dear1985};  \cite{Livesey2009}, pp. #-#. **] anticipate this, but once the comparison based on national bibliographies can be brought to a more reliable level, we can provide a statistically accurate picture of this. The available data suggests that the decline of Latin in Britain  is more drastic than it has been previously anticipated. 
The SNB and FNB allow us to zoom in and look at the Swedish case more closely and compare the different properties of the bibliographies. While the SNB portrays the general trend for the Swedish realm, it is also clear that Stockholm as a publication center dominates the image  (Fig. 3). Looking at the FNB, which consists mostly of publications from Turku (Åbo), one of the four university towns in the realm, shows that the distinct publication of profile of university towns are sometimes hidden under the national average. Still, also in Turku, we find a concrete decline in the share of Latin publications, but the decline was definitely later although the Academy in Turku has been described as one of the most utility-oriented universities in the Swedish realm and thus also most prone to use Swedish [CITE: \cite{lindberg1993} **].  One special feature with the FNB has to do with with the different roles of Swedish and Finnish as languages. While Swedish became a stronger candidate for academic publications, Finnish emerged as a written language especially in shorter religious and economic texts. Vernacularization was in this case not a process between two languages, but three. 
Keeping in mind the uncertainties relating to the HPBD, an inspection of university towns suggests that this is a wider trend. The university town, the capital, and the commercial centers had different linguistic publication profiles and vernacularization as a process happened in different phases. An analysis of languages used in publications from Cambridge, Oxford, Leiden, and Göttingen (Supplementary Fig. 3) shows how Latin lingered on, but also in these cases, like in Turku the local languages did gain a much more prominent position by the end of the eighteenth century. Compared to the absolute publishing centers in Europe Paris and London, the development is really late. Interestingly, the catalogues tell us about national trends, such as an early decline of Latin or competing vernaculars, but when viewed in comparison we can also see patterns that cross national  boundaries, such as different types of publishing milieus in towns commercial towns, university towns or capital cities. All of Europe had a cultural debt to sources from Antiquity, but this debt materialized differently in the places that were almost self-sufficient culturally like Paris and London or the university towns that embodied learning by attaching themselves to Latin traditions (CITE: for Latin-vernacular diglossia, see \cite{lindberg2006} **). 
Since both vernacularization and the rise of octavo seem to be inherently related to a modernization of public discourse, reading and writing, a final question is then if the change in the popularity of formats in the sixteenth and particularly eighteenth centuries is related to the shifts in language in the same period. It seems that there is no simple answer to this. Quite naturally, in all of the studied catalogues, the vernacular languages obtain a growing share of published books in the octavo format (Supplementary Fig. 5). A closer look at cities with different publication profiles shows that the matter was more complicated. For the ESTC the share of octavo books is for most cities higher in Latin books than for English books (Supplementary Fig. 3). Also in the SNB, both Latin and Swedish books tend to navigate towards smaller formats at the end of the eighteenth century (Supplementary Fig. 3), but the HPBD's record for German cities point at the octavo format being used in German-language books being more often than for Latin books (Supplementary Fig. 3). While there is not a clear correlation between language and format, the analysis of format nonetheless helps qualify earlier research. Henrik Horstbøll  has shown that the octavo format was particularly popular in Denmark with small histories that stood for a leisurely reading (CITE: Horstbøll2009. ‘In octavo' **), but by looking at much bigger sample, it is clear that the octavo format became more popular in other genres as well, including books published in Latin in university towns. Additional data and content analysis will in the future allow to look more closely at how genre, language and format relate to one another. It is reasonable to assume that smaller formats and different languages relate to new genres and new types of reading, but the relationship between the two is not straightforward.

Discussion

A statistical approach to bibliographic metadata is an emerging research area. In addition to providing novel approaches that can support qualitative research, bibliographic data science can add significant value to Linked Open Data and other infrastructures by providing new techniques to monitor and improve data quality. Whereas library catalogues are traditionally used for information storage and retrieval, we have demonstrated that systematic large-scale harmonization of the raw entries both within and across library catalogues can fill an important gap in their research use, and help to realize their research potential in publishing history.
As similar datasets national bibliographies are not only about mapping the national traditions of publishing, but can also be studied comparatively and ultimately be integrated across borders should so that they help overcome a national perspective in analyzing the past. We have in this article expanded our previous pilot studies on the Finnish and Swedish bibliographies and the ESTC towards large-scale integration of national bibliographies in the CERL Heritage of the Printed Book Database. Our harmonization and integration efforts are not complete, but clearly demonstrate how such integration can help scholarship to reach a more precise view of print culture beyond the confines of national bibliographies.