Biases, inaccuracies and gaps in data collection or quality may severely hinder productive use of the bibliographies as a research resource.
In our study based on the Swedish National Bibliography and the Finnish National Bibliography and focusing on publication patterns in Sweden and Finland during the period 1640-1910, we have encountered specific and largely overlooked challenges in using bibliographic catalogues for historical research.
Here, we demonstrate how such challenges can be overcome by specifically tailored data analytical ecosystems that provide scalable tools for data processing and analysis. Furthermore, we show how external sources of metadata, for instance, on authors, publishers, or geographical places, can be used to enrich and verify bibliographic information.
This type of data analytical ecosystem has potential for wider implementation in related studies and other bibliographies. In particular, our systematic approach provides a starting point and guidelines for more extensive integration of national catalogues. National bibliographies are essentially about mapping the national canon of publishing, but integrating data across borders should be managed in a way that takes into account specific local circumstances while also helping to overcome the national view in analyzing the past. Such integration can help scholarship to reach a more precise view of print culture beyond the confines of national bibliographies. Open availability of the raw data as well as the analysis methods is central for efficient, collaborative, and transparent research use of bibliographic collections in modern society. Whereas traditional data management policies do not support open sharing of these digital resources, the time for change is ripe. Open availability of bibliographic data collections and supporting data sources can foster innovative and nontraditional research use of the catalogs, as demonstrated in this article. In this rapidly changing field, the development toward more collaborative development of research methods can advance the transition from data management towards collaborative quality control and research. This demonstrates how comprehensive data harmonization is essential for accurate and useful data retrieval tasks and relevant for the overall usability of the catalogue information, and how the available classification and subject analyses, geographical information, and other data can be utilized, augmented, enriched and validated based on auxiliary information sources.information sources, such as digital maps for instance.  Integration of national bibliographies, special collections, and archives is relevant for international aspects of digital cataloging. As such the work highlights specific bottlenecks and shortcomings in the available cataloging and classification information, and can therefore provide relevant information for education, training, and management of cataloguing. Finally, we demonstrate how bibliographic catalog records can be used as a digital research resource, rather than a mere information retrieval tool.
We present an analysis of the overall publishing landscape in the period 1500-1800. Comprehensive harmonization and joint analysis of four large bibliographic catalogs has allowed us to assess publishing activity beyond what is accessible by the use of national catalogs alone. In addition to the historical analysis of knowledge production trends, we are releasing the openly licensed source code for catalog harmonization, and a notable improved version of the Finnish national bibliography, Fennica. This code and data release demonstrate the potential of our approach for research use of library catalogs, and the essential role that data harmonization and integration plays in this process.

Bibliographic data science

Challenges in the research use of digital bibliographies

Bias in terms of data collection processes or quality may hinder productive use of the bibliographies as a research resource..

Source code can be made available but bibliographies are not openly available. This sets remarkable limitations of efficient and collaborative research use, and accumulation of knowledge regarding the research use of these digital resources.