Introduction
Next generation DNA sequencing provides advanced tools for marine ecology and ecosystem monitoring (Closek et al., 2019; Kelly, Port, Yamahara, Martone, et al., 2014; Yamahara et al., 2019). The ability to sequence tens to hundreds of millions of reads in a single sequencing run allows for the development of novel genomic applications to a suite of research questions including species mapping, biomonitoring, gut content analyses, and population genomics, all of which aid our understanding of the ecology of marine ecosystems (Baetscher et al., 2019; Guo, 2017; Sanders et al., 2015; Thompson, Chen, Guo, Hyde, & Watson, 2017).
Key to these advances is next-generation sequencing metabarcoding. Metabarcoding is a process in which multiple species are identified from bulk DNA (e.g., homogenized gut contents or settlement tile scrapings) or environmental DNA (eDNA) samples (e.g., water and soil) typically by PCR amplification and sequencing of a target gene, and then comparing the resulting DNA sequences to a database of known reference sequences (Taberlet, Coissac, Hajibabaei, & Rieseberg, 2012). In particular, the application of eDNA metabarcoding allows researchers to detect a broad range of marine diversity from a single liter of seawater and has the potential to dramatically improve marine biomonitoring efforts (Kelly, Port, Yamahara, Martone, et al., 2014).
The success of metabarcoding approaches relies on the quality of reference databases, specifically their completeness and accuracy (Boyer et al., 2016; Machida, Leray, Ho, & Knowlton, 2017). The absence of reference barcodes for a given species for a target locus makes it impossible to accurately classify all sequences generated through metabarcoding with current bioinformatic technology (Deiner et al., 2017). Inadequate reference databases are an acute problem for barcoding, metabarcoding, and eDNA studies that limit the accuracy of taxonomic identification and have the potential to bias the interpretation of results (Andruszkiewicz et al., 2017; Djurhuus et al., 2020; Klymus, Marshall, & Stepien, 2017). Thus building complete and accurate reference databases is paramount to the success of molecular ecology monitoring efforts (Schenekar, Schletterer, Lecaudey, & Weiss, 2020). To address the need for accurate and complete reference databases, previous efforts were made to barcode California Current Large Marine Ecosystem fishes focused on the mitochondrial Cytochrome Oxidase I (COI ) locus (Ardura, Planes, & Garcia-Vazquez, 2013; Elena M Duke & Burton, 2020; Hastings & Burton, 2008; Ward, Hanner, & Hebert, 2009).
However, recent metabarcoding studies of marine fishes have focused instead on a short segment of the mitochondrial 12S RNA gene because it provides species-level resolution for many fishes while being vertebrate-specific (Miya et al., 2015; Valsecchi et al., 2019). The smaller 12S locus is also thought to be advantageous for eDNA studies because of DNA isolated from the environment tends to be degraded and commonly used sequencing technologies target relatively small loci (Collins et al., 2019; Jo et al., 2017; Miya et al., 2015). Given the success of this metabarcoding primer set, the MiFish Universal Teleost primer set is the most commonly used 12S barcode region because of its utility across a diverse assemblage of marine fishes (Bista et al., 2017; Closek et al., 2019; Thomsen et al., 2016; Valsecchi et al., 2019; Yamamoto et al., 2017).
Thus while there is a near complete CO1 barcode database of California Current fishes (Hastings & Burton, 2008), there is a relative lack of 12S barcodes for California Current fishes in existing reference databases; GenBank has MiFish 12S barcodes for 459 of the 864 California fish species (NCBI download October 2019). This paucity of barcodes severely limits the utility of 12Smetabarcoding approaches in California Current coastal waters (Andruszkiewicz et al., 2017; Djurhuus et al., 2020; Port et al., 2015), where relatively recently established marine protected areas (Gleason et al., 2013; Pondella et al., 2015; Thompson, Watson, McClatchie, & Weber, 2012) have created an urgent need for effective and economical monitoring (Elena Maria Duke, Harada, & Burton, 2018; Harada et al., 2015).
Metabarcoding has the ability to help marine resource managers address critical questions, ranging from shifting species distributions, effectiveness of marine protected areas, and seasonal patterns of larval fish recruitment, among others (Closek et al., 2019; Djurhuus et al., 2020; Elena M Duke & Burton, 2020; Kelly, Port, Yamahara, Martone, et al., 2014). However, the success of metabarcoding efforts to enhance fishery management in the California Current Large Marine Ecosystem depends on the development of an improved 12S barcode reference database. Towards this end, we developed the FishCARD reference database. This regionally-specific database is curated for marine fishes found in the California Current Large Marine Ecosystem, comprised of12S sequences previously available in GenBank supplemented by hundreds of additional 12S sequences generated during this study.