Importantly, our key observations regarding vernacularization etc. are supported by similar trends across multiple independently maintained bibliographies. This exemplifies the power of such approach in uncovering broad patterns in knowledge production, which are robust to occasional inaccuracies in the data. Whereas documentation and polishing continue, we have done all source code openly available, so that every detail of the data processing can be independently investigated and verified. Obtaining valid conclusions depends on efficient and reliable harmonization and augmentation of the raw entries. This paper demonstrates how such challenges can be overcome by specifically tailored data analytical ecosystems that provide scalable tools for data processing and analysis.
Whereas our current work is based on the analysis of national catalogues, it is helping to challenge the nationalistic view of individual catalogues, and paves the way towards large-scale data integration. A number of key challenges remain to be overcome, however, in enhancing data quality. However, we have demonstrated that significant historical trends, such as the rate of change in language use or book sizes are often overwhelmingly clear and seen across multiple independently collected catalogues. Integrative analysis can thus help to verify the information and provide complementary views to the universally observed historical trends. Our systematic approach provides a starting point, guidelines, and a set of practically tested algorithms for more extensive analysis and integration.