Data harmonization is only the starting point for our analysis, albeit an important one. The harmonized data sets can be linked to to existing LOD and other data infrastructures, and further subjected to statistical analysis, as in our case studies in the next section. 
Our analysis of the FNB demonstrates the advantages of open availability of library catalogues. The raw MARC entries of the FNB  have been openly released by the National Library of Finland. We have now harmonized, augmented, and enriched this data with the open data analytical ecosystem, and hereby release the final harmonized data set that we have used in this study so that it can be further verified, investigated, and enriched by academics as well as the general public. The open availability allows us to demonstrate the advantages of a reproducible data analysis workflow, which provides a transparent account of every step from raw data to the final results. 
This process has generated a vast body of custom algorithms and concepts that support reproducible analysis of library catalogues [REFS - bibliographica R package]. These methods complement traditional software interfaces that have been designed for browsing and automated retrieval, rather than scalable statistical research. Data harmonization and quantitative analysis are intricately related objectives. Often, the actual data analysis reveals unnoticed shortcomings in the data. Hence, bibliographic data science is an inherently iterative process, where improved understanding of the data and historical trends can lead to enhances in the data harmonization procedures, and to new, independent ways to validate the data and observed patterns.

Language and format of early modern publications

The hand-press period is particularly fruitful for quantitative research because there were remarkably few changes in printing technology from 1450 to approximately 1830s. It has been famously claimed that Gutenberg himself would have been able to operate a printing press in late eighteenth-century London since it would have been so similar to the one found in mid-fifteenth-century Mainz. As revolutionary as the movable type printing press was for early modern culture and economy in general, it is a good fortune for our aspirations to understand the development of early modern publishing that there were no game-changing innovations for the next 400 years or so after Gutenberg's time in printing technology [FOOTNOTE:  \cite{McKitterick2005}. About the relevance of movable type printing press, see \cite{eisenstein1980printing}. See also, \cite{cipolla1972} and \cite{Pettegree2008}. On economic impact of printing press on early modern cities :\cite{Dittmar_2011}. See also, \cite{coldiron2015printers} and  \cite{Coldiron_2004} **] In our research on different library catalogue metadata we have come to realise that the relatively stable nature of printing opens up different avenues for cross-European research. For example, we can estimate the long-term development of book formats in some detail across Europe, which in turn is significant for understand the relevance of printing for the establishment of public sphere. This is why for this article we have developed two cross-catalogue cases to analyse the rise of octavo format and process of vernaculartization in the early modern period. This tests also the catalogues in their different levels of data harmonization and respective levels of historical representativity. Both of these research cases represent large-scale European-wide transformations that took place predominantly during the hand-press era, but an inspection of them through several catalogues and by zooming in and out bearing in mind different publication profiles of European cities show intriguing variety. The cases also make it possible to discuss how the used methods, varying levels of data harmonization and gaps in data affect the analyses, thus paving the way for new research and guidelines for future data integration.
The rise of octavo in the Enlightenment period
The general trend in the catalogues that we have studied is that octavo format supersedes other printing formats during the eighteenth century. [FOOTNOTE: Henrik Horstbøll has previously studied the relevance of octavo format for Danish publishing in detail based on analogue methods and smaller samples. Our work confirms his findings and further extends the scope by studying a much larger and cross-European data. See, \cite{Horstbøll1999};  \cite{Horstböll2009} and \cite{Horstböll2010}]. We can measure this by looking at a simple title count of documents published in different formats, or we can study the paper consumption of these formats in which case we are focused on the print area of the documents instead of counting the number of documents. We find the study of the print area quite useful and our choice in this article has been to examine particularly the paper consumed in the printed documents also so that we can compare our findings to our earlier studies that also focus on paper consumption. When we examine the publishing trends of book formats in the HPBD, we notice that at a general European level the rise of octavo format is particularly strong during the eighteenth century (fig. 1). This is confirmed by ESTC (fig. 1) and SNB (fig. ) where Octavo is not only the fastest gainer of the market, but also holds the largest share by the end of the eighteenth century. If we look at particular places with respect to octavo share in HPBD, a striking feature is the octavo share in German cities of Frankfurt (fig.Supplementary Fig. 1), Leipzig (fig. Supplementary Fig. 1), Halle (Supplementary Fig. 1) and Berlin (fig.Supplementary Fig. 1). The manner in which folio drops and octavo rises in German soil during the eighteenth century suggests that octavo format was the high rising star of the Enlightenment. 
Among this type of general Europe-wide trends, there are of course local differences, and for example in Turku (Supplementary Fig. 1), and Finland that was part of Sweden at the time, the rise of octavo comes much later than in Sweden in general. This was due to the fact that the main part of the documents printed in Finland were official documents,  pamphlets and theses. If we look at the fractions of formats in Turku, another way of saying this would be that printing in Turku only takes off in the later eighteenth century whereas in Stockholm hand press printing industry seems to have reached a different level of maturity earlier (fig. ??). The simplest explanation for the success of octavo format is that it was particularly suited for smaller books that could be carried around and read practically anywhere, whereas quarto (and folio) formats were more commonly used in governmental and academic documents; pamphlets and in larger books alike, especially in the earlier centuries [FOOTNOTE: about relationship between books and pamphlets, see \cite{raymond2003} **]. We have analysed the relevance of the rise of octavo with respect to book printing in the case of "history" publishing earlier [CITE: \cite{Lahti_2015} **]. Of course, larger formats in book printing carried certain prestige also in the eighteenth century even when reading started to be partly removed from stately mansion libraries, becoming more equal and the price of the book turned out to be a decisive factor for dissemination of ideas [CITE:  \cite{Allan_2013}.  \cite{Allan2008}\cite{Allan2008b}\cite{Towsey_2010} **] When considering quarto and octavo publications, it is quite telling that David Hume (1711-1776) wanted his History of England to be printed in quarto sized fine-paper six-volume set in late 1760s (as it had appeared earlier), but the editions that were actually published after 1767 until Hume's death (including the 1778 posthumous edition) are octavo editions in eight volumes.  Octavo editions might have lacked the exclusivity and finesse of heavier tomes with large margins that connoisseurs might have preferred for aesthetic reasons, but it was particularly the cheaper and smaller formats, octavo and duodecimo, that changed the nature and relevance of printing in the later part of the eighteenth century.