Our data harmonization efforts follow similar principles and largely identical algorithms across all catalogues. In this work, we focus on a few selected fields, namely publication year and place, language, and physical dimensions. We have removed spelling errors, disambiguation and standardized terms, augmented missing values, and developed custom algorithms, such as conversions from the raw MARC notation to numerical page count estimates [REFS], and implemented many of these tools in the bibliographica R package. We have also added derivative fields, such as print area, which quantifies the overall number of sheets in distinct documents in a given period, and thus the overall breadth of printing activity. We have also used external data sources on authors, publishers, and places to enrich and verify bibliographic information. An overview of the harmonized data sets and full algorithmic details of our analysis are available via Helsinki Computational History Group website [LINK: https://comhis.github.io/2019_CCQ **]. 
Automation and scalability are critical as the catalogue sizes in this study are as high as 6 million [CHECK] entries in the HPBD.