Furthermore, we use external data sources, for instance on geographical places to further complement, enrich, and verify the information that is available in the original library catalogues. We constantly monitor the data processing quality based on automated unit tests, cross-linking, manual curation, and matching with external databases. In this, we have incorporated best practices and tools from data science.
Our analysis of the FNB demonstrates the research potential of openly available bibliographic data resources. We have enriched and augmented the raw MARC entries that have been openly released by the National Library of Finland. Open availability of the source data is allowing us to implement reproducible data analysis workflows, which provide a transparent account of every step in data analysis from raw data to the final summaries. In addition, the open licensing of the original data allows us to share our enriched version [TÄSSÄ PITÄÄ TARKISTAA, ETTÄ ON LUPA KÄYTTÄÄ MYÖS KAIKKIA RIKASTUKSEEN KÄYTETTYJÄ ULKOISIA AINEISTOJA..!] openly so that it can be further verified, investigated, and enriched by other investigators. Although we do not have permissions to provide access to the original raw data entries for the other catalogues, we are releasing the full source code of our algorithms. With this, we aim to contribute to the growing body of tools that are specifically tailored for use in this field. Moreover, we hope that the increasing availability of open analysis methods can pave the way towards gradual opening of bibliographic data collections. This can follow related successes in other fields, such as the human genome sequencing project and subsequent research programs, which critically rely on centrally maintained and openly licensed data resources, as well as thousands of algorithmic tools that have been independently built by the research community to draw information and insights from these data collections [REFS].