Our current harmonization strategies are based on manually implemented rules for data processing. Future developments could take increasing advantage of adaptive machine learning techniques that can reduce the need for human input and improving the overall scalability of automated data harmonization. When combined with a proper quality control, such data analytical ecosystems have potential for wider implementation in related studies in the digital humanities. Moreover, ecology and related fields that have well established methods for spatio-temporal data analysis, provide a variety of statistical techniques for the analysis of such data collections. Open availability of the raw data as well as the analysis methods is central for efficient, collaborative, and cumulative research use of bibliographic collections in modern society.

Conclusion

We have conceptualized a new approach and technologies to expand the research potential of bibliographic cataloguing and classification, calling this approach bibliographic data science. Whereas national bibliographies can provide comprehensive quantitative insights to the overall historical dynamics of the evolving publishing landscape across time and geography, we have encountered specific and largely overlooked challenges in using bibliographic metadata collections for historical research. Drawing valid conclusions critically depends on efficient and reliable harmonization and augmentation of the raw entries, and biases, gaps, and inaccuracies in data collection may remarkably hinder productive research use of the bibliographies. Here, we have overcome some of these challenges by specifically tailored open data analytical ecosystems that facilitate robust statistical research use of bibliographic metadata collections. This approach has potential for wider implementation in related studies and other bibliographies, and provides guidelines for more extensive integration of national metadata collections, thus helping to overcome to get at transnational historical processes and moving towards a more precise view of print culture beyond the confines of national bibliographies.

Supplementary Material

All analysis source code for data cleaning and harmonization and the reproducible Rmarkdown documents for generating the figures and tables in this document are available through Helsinki Computational History Group (COMHIS) website at https://comhis.github.io. The specific versions used in this work have been included as supplementary material. We have also included the harmonized version of Fennica, the Finnish national bibliography, whose original MARC data entries are openly available from National Library of Finland. The harmonized version has been prepared for this manuscript, and  is openly available and can be freely used. We are committed to maintaining and further improving the data harmonization, and future versions of this data release will be also available via the indicated research website. 

Acknowledgements

This work was supported by the Academy of Finland under Grant 293316. We are grateful for the National Library of Finland, the National Library of Sweden, the British Library, and CERL for providing the bibliographies for use in this research, and for the members of Helsinki Computational History Group for supporting this work.
REFERENCES
@Manual{R2018,