Our current harmonization strategies are based on manually implemented rules for data processing, future developments could take increasing advantage of adaptive machine learning techniques that can learn such rules and exceptions from training examples, hence reducing the need for human input and improving the overall scalability of automated data harmonization. When combined with a proper quality control, this type of data analytical ecosystem has potential for wider implementation in related studies and other bibliographies as many of the encountered data analytical problems are commonly encountered in digital humanities. Moreover, ecology and related fields that have well established methods for spatio-temporal data analysis, provide a variety of statistical techniques for the analysis of such data collections.
We have used external sources of metadata, for instance, on authors, publishers, and places, to enrich and verify the information.
This paper demonstrates how such challenges can be overcome by specifically tailored data analytical ecosystems that provide scalable tools for data analysis. Open data availability would greatly advance critical, collaborative, and cumulative efforts to design and utilize targeted data analysis algorithms. Successful examples exist in other data-intensive fields, such as computational biology, where open availability of commonly generated data resources and algorithms is an established norm. Open availability of the raw data as well as the analysis methods is central for efficient, collaborative, and transparent research use of bibliographic collections in modern society. We seek to advance open research by releasing a notably improved version of the Finnish national bibliography FNB. Our open science approach facilitates collaborative methods development. We are constantly taking advantage of, and contributing to, the growing body of open source algorithms in the relevant research fields.
Whereas our current work is based on the analysis of national bibliographies, it is helping to challenge the nationalistic view of individual catalogues, and paves the way towards large-scale data integration. A number of key challenges remain to be overcome, however, in enhancing data quality, but we have demonstrated that significant historical trends, such as the rate of change in language use or book sizes are often overwhelmingly clear and seen across multiple independently collected catalogues. Integrative analysis can thus help to verify the information and provide complementary views to the universally observed historical trends. Our systematic approach provides a starting point, guidelines, and a set of practically tested algorithms for more extensive analysis and integration. 

Conclusion [300 sanaa sovittu alustavasti - nyt ollaa suunnilleen siinä]

We have conceptualized a new approach and technologies to expand the research potential of bibliographic cataloging and classification calling this bibliographic data science. We discuss the future implications of these methods. This covers key research aspects on the content, management, use, and usability of bibliographic records, with further implications on the underlying principles, functions, and techniques of descriptive cataloging. Our work combines both traditional and contemporary elements of research, and combines theory and scholarly research with a practical application. National bibliographies can provide comprehensive quantitative insights to the overall historical dynamics of the evolving publishing landscape across time and geography. Biases in data collection or quality may remarkably hinder productive research use of the bibliographies, however. Drawing valid conclusions critically depends on efficient and reliable harmonization and augmentation of the raw entries. In our study based on the Swedish National Bibliography and the Finnish National Bibliography and focusing on publication patterns in Sweden and Finland during the period 1640-1910, we have encountered specific and largely overlooked challenges in using bibliographic catalogues for historical research. Here, we have demonstrated how such challenges can be overcome by specifically tailored open source workflows for data processing and analysis. Furthermore, we show how external sources of metadata, for instance, on authors, publishers, or geographical places, can be used to enrich and verify bibliographic information. This work has potential for wider implementation in related studies and other bibliographies, and provides guidelines for more extensive integration of national catalogues, thus helping to overcome the national view in analysing the past towards a more precise view of print culture beyond the confines of national bibliographies.
pohjoismaa-aspekti ei ole tämän artikkelin kannalta enää relevantti, vaan ollaan vaan menty kohti cross-catalogue analyysiä jossa myös nämä ovat mukana. Kuitenkaan ne ei ole mitenkään käsitteellisesti relevantteja tähän.
Tässä myös tämä Fennica ja Kungliga. Liikkeelle lähdettiin siis siitä että tämä juttu olisi reflektiota siitä miten sen kanssa on tehty. Lopputulos oli kuitenkin se että tultiin kunnianhimoisempaan lähtökohtaan eli tässä nyt sitten paljon muutakin ja paljon mielekkäämmin. Kuitenkin myös tekstiä pitää samassa suhteessa muistaa päivittää.
Transfer into practice !