Anirudh Prabhu

and 13 more

Dave Vieglais

and 16 more

Material samples are vital across multiple scientific disciplines with samples collected for one project often proving valuable for additional studies. The Internet of Samples (iSamples) project aims to integrate large, diverse, cross-discipline sample repositories and enable access and discovery of material samples as FAIR data (Findable, Accessible, Interoperable, and Reusable). Here we report our recent progress in controlled vocabulary development and mapping. In addition to a core metadata schema to integrate SESAR, GEOME, Open Context, and Smithsonian natural history collections, three small but important controlled vocabularies (CVs) describing specimen type, material type, and sampled feature were created. The new CVs provide consistent semantics for high-level integration of existing vocabularies used in the source collections. Two methods were used to map source record properties to terms in the new CVs: Keyword-based heuristic rules were manually created where existing terminologies were similar to the new CVs, such as in records from SESAR, GEOME, and Open Context and some aspects of Smithsonian Darwin Core records. For example specimen type =liquid>aqueous in SESAR records mapped to specimen type = liquid or gas sample and material type = liquid water. A machine learning approach was applied to Smithsonian Darwin Core records to infer sampled feature terms from record text describing habitat, locality, higher geography, and higher classification fields. Applying fastText with a 600-billion-token corpus in the general domain, we provided the machine a level of “understanding” of English words. With 200 and 995-record training sets, 87%, 94% precision and 85%, 92% recall were obtained respectively, yielding performance sufficient for production use. Applying these approaches, more than 3x106 records of the four large collections have been mapped successfully to a common core data model facilitating cross-domain discovery and retrieval of the sample records.

Kerstin Lehnert

and 2 more

Analytical studies of astromaterials samples returned by NASA space missions generate unique and highly valuable data that contribute fundamentally to our knowledge and understanding of the origin and evolution of Earth, our solar system, and the universe. These data need to be openly accessible and curated in a manner that maximizes their reuse in and utility for future science and that ensures their quality and long-term preservation. In several recent strategic documents and reports, NASA recognizes this need [1] and is adjusting its science information policies [2]. In 2020, NASA charged the Planetary Data Ecosystem Independent Review Board (PDE-IRB) to conduct a review of the planetary data landscape and make recommendations for improving access to and use of planetary science data by the science community [3]. This presentation will highlight features and services of the Astromaterials Data System that align the IRB’s recommendations. The Astromaterials Data System (Astromat) is a data infrastructure that has been funded by NASA since 2018 to curate, archive, and publish analytical data that are generated from astromaterials samples collected by NASA missions and curated at the Johnson Space Center in the Astromaterials Research & Exploration Science Division. Astromat’s mission is to: preserve astromaterials data and ensure their long-term access and reusability for new science endeavors; restore legacy data of astromaterials samples acquired in the past; synthesize historic and new data into a comprehensive, analysis-ready data store that allows scientists to use new technologies such as Machine Learning and Artificial Intelligence to explore and mine these data in previously impossible ways. Astromat operates a data repository where researchers can deposit their data for archiving and publications, specifically to comply with new journal policies and guidelines for Open and FAIR data and Data Management Plans required by funders. The repository follows international best practices. Astromat also maintains the Astromat Synthesis, a relational database that integrates legacy and new data into a harmonized data collection that allows users to find and extract data at the granularity of individual analytical measurements and combine these into customized new compilations for advanced data analysis. [1] SMD’s Strategy for Data Management and Computing for Groundbreaking Science 2019-2024. [2] Scientific Information policy for the Science Mission Directorate, SMD Policy Document SPD-41 (August 2021). [3] Besse, S., et al. (2021). LPI Contributions 2549, 7070.

Jeffery Horsburgh

and 3 more

Critical Zone (CZ) scientists study the system of coupled chemical, biological, physical, and geological processes operating together across all scales to support life at the Earth’s surface (Brantley et al., 2007). In 2020, the U.S. National Science Foundation funded a new network of Thematic Cluster projects who are working collaboratively to answer scientific questions related to effects of urbanization on CZ processes; CZ function in semi-arid landscapes and the role of dust in sustaining these ecosystems; processes in deep bedrock and their relationship to CZ evolution; recovery of the CZ from disturbances such as fire and flooding; and changes in the coastal CZ related to rising sea level. Given the diversity of data being collected by these projects, supporting data collection, access, and archival for the larger network presents significant challenges. Leveraging existing repositories and cyberinfrastructure provides many benefits, but still poses the questions of which repositories to use and how to enable discovery of and access to data that may be deposited across different repositories. This presentation describes new cyberinfrastructure development that leverages existing, domain-specific data repositories to enable managing, curating, disseminating, and preserving data from the new network of CZ Thematic Cluster projects. A distributed architecture is under development that links existing data facilities and services, including HydroShare, EarthChem, SESAR, and eventually other systems as needed, via a CZ Hub that provides tools for simplified data submission, discovery and access, and links to computational resources for data analysis and visualization in support of CZ synthesis efforts. Our goal is to make data, samples, and software collected by the Thematic Cluster projects Findable, Accessible, Interoperable, and Reusable (FAIR), using existing domain-specific repositories. This collaboration among repositories to deliver integrated data services for an interdisciplinary science program may provide a template for future development of integrated, interdisciplinary data services. Brantley, S.L., M.B. Goldhaber, V. Ragnarsdottir (2007). Crossing disciplines and scales to understand the Critical Zone. Elements 3, 307-314, doi:10.2113/gselements.3.5.307.

Lucia Profeta

and 7 more

The Astromaterials Data System (AstroMat) is a NASA-funded project, working in close collaboration with the Johnson Space Center (JSC) to provide access to and preserve analytical data from JSC’s astromaterials collections. Meteorite data from close to 1000 peer-reviewed publications, primarily from the JSC Antarctic collections, and data from over 800 lunar publications have been ingested into AstroMat. Data can be explored at the level of reference and sample, or queried interactively through the AstroDB Search (AstroSearch). AstroSearch v 1.0 incorporated lunar and meteorite JSC collections, lunar missions, geofeatures, taxons, analyzed materials, and analysis methods searches. Working closely with domain scientists we have developed AstroSearch v 2.0. This version of the interface enhances the functionality of the original by adding search by chemistry (with comprehensive variable and unit selection), more granular analysis type refinement, and a streamlined customizable data output. The AstroDB Search is paired with the AstroDesk application, where users can login via their ORCiD and save an unlimited number of customized search queries. For researchers who need to submit, archive, and share their data with citable unique identifiers (DOIs) to comply with publisher and funding agency requirements, AstroMat offers a companion service to the AstroDB - the Astromaterials Data Repository (AstroRepo). Through its commitment to long-term data access and preservation, the Astromaterials Data Systems aims to help align cosmochemistry data with the Big-Data Era and reduce time to science for Planetary Sciences researchers by providing FAIR data for next generation scientific applications.