Bibliographies used in this study
Swedish National Bibliography and the Finnish National Bibliography and focusing on publication patterns in Sweden and Finland during the period 1640-1910, we have encountered specific and largely overlooked challenges in using bibliographic catalogues for historical research.
- Fennica -> Finnish National Bibliography (FNB)
- Kungliga -> Swedish National Bibliography (SNB)
- English Short-Title Catalog (ESTC)
- Heritage of the Printed Book Database (HPB)
Analysis of the FNB allows us to exemplify the potential of openly available bibliographic data resources in terms of data enrichment and reuse.
By releasing the source code of our algorithms, we aim to contribute to the growing body of algorithms that are specifically tailored for use in this field. Moreover, we hope that the open availability of analysis methods is gradually paving the way towards the opening of valuable bibliographic data resources, following related successes in other fields, such as the human genome sequencing project and subsequent research programs, which critically rely on openly licensed and centrally maintained data resources, and thousands of algorithmic tools that have been independently built by various members of the research community and increase the value of these data collections.
Scalable data harmonization, enrichment and validation
Data access,
parsing,
cleaning,
harmonization,
enrichment,
Obtaining valid conclusions depends on efficient and reliable harmonization and augmentation of the raw entries.
Furthermore, we show how external sources of metadata, for instance, on authors, publishers, or geographical places, can be used to enrich and verify bibliographic information. This type of ecosystem has potential for wider implementation in related studies and other bibliographies.
Research use is part of validation
Discussion: potential ML/AI
sellainen mistä olisi paljon hyötyä olisi selitykset miten erilaiset arviot on tehty. Nämä kannattais tehdä melkein erillisenä ekaks että niitä vois sitten käyttää myös muualla. Tämän jälkeen yhdistää tekstiin ja ehkä lyhentää jne. Tarkoitan siis esim. kuvausta siitä miten formaattitietoja puuttuvat on täydennetty jne. Eikö nämä pidä jotenkin olla mukana?
Towards a unified view: catalogue integration
This paper demonstrates how such challenges can be overcome by specifically tailored data analytical ecosystems that provide scalable tools for data processing and analysis.
Recognition of duplicates
Furthermore, we show how external sources of metadata, for instance, on authors, publishers, or geographical places, can be used to enrich and verify bibliographic information. This type of ecosystem has potential for wider implementation in related studies and other bibliographies.
Open bibliographic data science
data organization, data and code sharing, interfaces, software modules, analytical ecosystems