Data harmonization is only the starting point for our analysis, albeit an important one. The harmonized data sets are further  subjected to statistical analysis, as exemplified in our case studies in the next section. This process includes the development of new,  targeted algorithms and concepts that support efficient and reproducible summarization of library catalogues. Over time, we have accumulated and documented a vast body of algorithms that can facilitate the analyses [REFS - bibliographica R package] and complement traditional software interfaces have been designed for browsing and automated retrieval, rather than scalable statistical research purposes, which is our second key target besides ensuring the data quality. In fact, data harmonization and higher-level analysis are intricately related objectives. Often the actual data analysis reveals unnoticed shortcomings in the data. 
Our analysis of the FNB demonstrates the research potential of openly available data resources. In particular, we have enriched and augmented the raw MARC entries that have been openly released by the National Library of Finland, and are hereby openly releasing the harmonized data set so that it can be further verified, investigated, and enriched by other investigators. The open availability has  allowed us to implement reproducible data analysis workflows, which provide a transparent account of every step in data analysis from raw data to the final summaries.  
Bibliographic data science is also an iterative process, where improved understanding of the data and historical patterns can lead to improved data harmonization procedures, which can be verified by external information sources. Hence, reproducible and automated data processing can allow systematic, iterative corrections of the observed errors, and rigorous analysis of different methodological choices.