Conclusion
We have conceptualized a new approach and technologies to expand the research potential of bibliographic cataloguing and classification, calling this approach bibliographic data science. Whereas national bibliographies can provide comprehensive quantitative insights to the overall historical dynamics of the evolving publishing landscape across time and geography, we have encountered specific and largely overlooked challenges in using bibliographic metadata collections for historical research. Drawing valid conclusions critically depends on efficient and reliable harmonization and augmentation of the raw entries, and biases, gaps, and inaccuracies in data collection may remarkably hinder productive research use of the bibliographies. Here, we have overcome some of these challenges by specifically tailored open data analytical ecosystems that facilitate robust statistical research use of bibliographic metadata collections. This approach has potential for wider implementation in related studies and other bibliographies, and provides guidelines for more extensive integration of national metadata collections, thus helping to overcome to get at transnational historical processes and moving towards a more precise view of print culture beyond the confines of national bibliographies.
Supplementary Material
All analysis source code for data cleaning and harmonization and the reproducible Rmarkdown documents for generating the figures and tables in this document are available through Helsinki Computational History Group (COMHIS) website at
https://comhis.github.io. The specific versions used in this work have been included as supplementary material. We have also included the harmonized version of Fennica, the Finnish national bibliography, whose original MARC data entries are openly available from National Library of Finland. The harmonized version has been prepared for this manuscript, and is openly available and can be freely used. We are committed to maintaining and further improving the data harmonization, and future versions of this data release will be also available via the indicated research website.
Acknowledgements
This work was supported by the Academy of Finland under Grant 293316. We are grateful for the National Library of Finland, the National Library of Sweden, the British Library, and CERL for providing the bibliographies for use in this research, and for the members of Helsinki Computational History Group for supporting this work.
til
Cover letter
Dear Editor,
We kindly ask You to consider the attached manuscript for publication in the special issue on "The Role and Function of National Bibliographies for Research in Different Academic Disciplines" in Cataloging & Classification Quarterly. The work is original, it has not been published elsewhere or submitted simultaneously for publication elsewhere.
We present an analysis of the overall publishing landscape in the period 1500-1800 based on comprehensive harmonization and joint analysis of four large bibliographic catalogs. This has allowed us to assess publishing activity beyond what is accessible by the use of national catalogs alone. In addition to the historical analysis of knowledge production trends, we are releasing the openly licensed source code for catalog harmonization, and a notable improved version of the Finnish national bibliography, Fennica. This code and data release demonstrate the potential of our approach for research use of library catalogs, and the essential role that data harmonization and integration plays in this process.
The work directly addresses multiple aspects that are relevant to the CCQ journal in general, and the special issue on national bibliographies in particular. This work demonstrates how comprehensive data harmonization is essential for accurate and useful data retrieval tasks and relevant for the overall usability of the metadata information, and how the available classification and subject analyses, geographical information, and other data can be utilized, augmented, enriched and validated based on auxiliary information sources.information sources, such as digital maps for instance. Integration of national bibliographies, special collections, and archives is relevant for international aspects of digital cataloging. As such the work highlights specific bottlenecks and shortcomings in the available cataloging and classification information, and can therefore provide relevant information for education, training, and management of cataloguing. Finally, we demonstrate how bibliographic catalog records can be used as a digital research resource, rather than a mere information retrieval tool.