Fourth, the content in bibliographic metadata collections are the products of at least three multi-layered historical processes. The digitization of traditional card catalogues may have meant an exclusion of material that was regarded as less important or covered elsewhere. Similarly, the collection of early national bibliographies have in general been based on a collection of existing bibliographies that were originally collected for other purposes (FOOTNOTE: For a discussion on the Danish National Bibliography, see Horstbøll 1999**). Naturally, the national bibliographies have not been able to include everything published, albeit the effort towards completeness has been remarkable. Further, the records reflect different historical practices of printing and publishing. In eighteenth-century Sweden, for instance, printing laws and decrees formed a crucial part of political discourse and was of great economic value to the book industry (CITE: Rimm, A.-M. 2005a. Den kungliga boktryckaren, del 1. Biblis 30: 4–31; Rimm, A.-M. 2005b. Den kungliga boktryckaren, del 2. Biblis 31: 27–44.**), whereas in Britain this was the case to a much lesser degree. Such practices are noticeable in the bibliographic metadata collections, but tell us more about precisely printing practices, not necessarily about other social and political phenomena, such as language relations, that we might want to study through the data. Any historically interested study using national bibliographies must therefore be attentive to these historical layers contained in the data in order to propose reasonable interpretations to quantitative data analysis.
This process has generated a vast body of custom algorithms and concepts that support reproducible analysis of library catalogues [REFS - bibliographica R package]. These methods complement traditional software interfaces that have been designed for browsing and automated retrieval, rather than scalable statistical research. Data harmonization and quantitative analysis are intricately related objectives. Often, the actual data analysis reveals unnoticed shortcomings in the data. Hence, bibliographic data science is an inherently iterative process, where improved understanding of the data and historical trends can lead to enhances in the data harmonization procedures, and to new, independent ways to validate the data and observed patterns.
Problems: challenges in dirty data come also with key messages for research regarding the content, management, use, and usability of bibliographic records, with further implications on the underlying principles, functions, and techniques of descriptive cataloging.