as well as manual curation. We have incorporated best practices from data science, taking advantage of tidy data formats [REFS], standard database structures and query tools, and statistical programming [REFS]. A number of R packages and Python libraries have been essential in this work [ADD MOST IMPORTANT ONES?]. For a full list, see the software page of the project.
We have used external sources of metadata, for instance, on authors, publishers, or geographical places, to further enrich and verify the information that is available in the bibliographies.
We analyze the numbers of data coverage.
Our analysis of the FNB demonstrates the research potential of openly available bibliographic data resources. We have remarkably enriched and augmented the raw MARC entries that have been openly released by the National Library of Finland. Open availability of the source data is allowing us to implement reproducible data analysis workflows, which provide a transparent account of every step in data analysis from raw data to the final summaries. In addition, the open licensing of the original data allows us to share our enriched version [TÄSSÄ PITÄÄ TARKISTAA, ETTÄ ON LUPA KÄYTTÄÄ MYÖS KAIKKIA RIKASTUKSEEN KÄYTETTYJÄ ULKOISIA AINEISTOJA..!] openly so that it can be further verified, investigated, and enriched by other investigators. Although we do not have permissions to provide access to the original raw data entries for the other catalogues, we are releasing the full source code of our algorithms. With this, we aim to contribute to the growing body of tools that are specifically tailored for use in this field. Moreover, we hope that the increasing availability of open analysis methods can pave the way towards gradual opening of bibliographic data collections. This can follow related successes in other fields, such as the human genome sequencing project and subsequent research programs, which critically rely on centrally maintained and openly licensed data resources, as well as thousands of algorithmic tools that have been independently built by the research community to draw information and insights from these  data collections [REFS].