Introduction
Next generation DNA sequencing provides advanced tools for marine
ecology and ecosystem monitoring (Closek et al., 2019; Kelly, Port,
Yamahara, Martone, et al., 2014; Yamahara et al., 2019). The ability to
sequence tens to hundreds of millions of reads in a single sequencing
run allows for the development of novel genomic applications to a suite
of research questions including species mapping, biomonitoring, gut
content analyses, and population genomics, all of which aid our
understanding of the ecology of marine ecosystems (Baetscher et al.,
2019; Guo, 2017; Sanders et al., 2015; Thompson, Chen, Guo, Hyde, &
Watson, 2017).
Key to these advances is next-generation sequencing metabarcoding.
Metabarcoding is
a
process in which multiple species are identified from bulk DNA (e.g.,
homogenized gut contents or settlement tile scrapings) or environmental
DNA (eDNA) samples (e.g., water and soil) typically by PCR amplification
and sequencing of a target gene, and then comparing the resulting DNA
sequences to a database of known reference sequences (Taberlet, Coissac,
Hajibabaei, & Rieseberg, 2012). In particular, the application of eDNA
metabarcoding allows researchers to detect a broad range of marine
diversity from a single liter of seawater and has the potential to
dramatically improve marine biomonitoring efforts (Kelly, Port,
Yamahara, Martone, et al., 2014).
The success of metabarcoding approaches relies on the quality of
reference databases, specifically their completeness and accuracy (Boyer
et al., 2016; Machida, Leray, Ho, & Knowlton, 2017). The absence of
reference barcodes for a given species for a target locus makes it
impossible to accurately classify all sequences generated through
metabarcoding with current bioinformatic technology (Deiner et al.,
2017). Inadequate reference databases are an acute problem for
barcoding, metabarcoding, and eDNA studies that limit the accuracy of
taxonomic identification and have the potential to bias the
interpretation of results (Andruszkiewicz et al., 2017; Djurhuus et al.,
2020; Klymus, Marshall, & Stepien, 2017). Thus building complete and
accurate reference databases is paramount to the success of molecular
ecology monitoring efforts (Schenekar, Schletterer, Lecaudey, & Weiss,
2020). To address the need for accurate and complete reference
databases, previous efforts were made to barcode California Current
Large Marine Ecosystem fishes focused on the mitochondrial Cytochrome
Oxidase I (COI ) locus (Ardura, Planes, & Garcia-Vazquez, 2013;
Elena M Duke & Burton, 2020; Hastings & Burton, 2008; Ward, Hanner, &
Hebert, 2009).
However, recent metabarcoding studies of marine fishes have focused
instead on a short segment of the mitochondrial 12S RNA gene
because it provides species-level resolution for many fishes while being
vertebrate-specific (Miya et al., 2015; Valsecchi et al., 2019). The
smaller 12S locus is also thought to be advantageous for eDNA
studies because of DNA isolated from the environment tends to be
degraded and commonly used sequencing technologies target relatively
small loci (Collins et al., 2019; Jo et al., 2017; Miya et al., 2015).
Given the success of this metabarcoding primer set, the MiFish Universal
Teleost primer set is the most commonly used 12S barcode region
because of its utility across a diverse assemblage of marine fishes
(Bista et al., 2017; Closek et al., 2019; Thomsen et al., 2016;
Valsecchi et al., 2019; Yamamoto et al., 2017).
Thus while there is a near complete CO1 barcode database of
California Current fishes (Hastings & Burton, 2008), there is a
relative lack of 12S barcodes for California Current fishes in
existing reference databases; GenBank has MiFish 12S barcodes for
459 of the 864 California fish species (NCBI download October 2019).
This paucity of barcodes severely limits the utility of 12Smetabarcoding approaches in California Current coastal waters
(Andruszkiewicz et al., 2017; Djurhuus et al., 2020; Port et al., 2015),
where relatively recently established marine protected areas (Gleason et
al., 2013; Pondella et al., 2015; Thompson, Watson, McClatchie, &
Weber, 2012) have created an urgent need for effective and economical
monitoring (Elena Maria Duke, Harada, & Burton, 2018; Harada et al.,
2015).
Metabarcoding has the ability to help marine resource managers address
critical questions, ranging from shifting species distributions,
effectiveness of marine protected areas, and seasonal patterns of larval
fish recruitment, among others (Closek et al., 2019; Djurhuus et al.,
2020; Elena M Duke & Burton, 2020; Kelly, Port, Yamahara, Martone, et
al., 2014). However, the success of metabarcoding efforts to enhance
fishery management in the California Current Large Marine Ecosystem
depends on the development of an improved 12S barcode reference
database. Towards this end, we developed the FishCARD reference
database. This regionally-specific database is curated for marine fishes
found in the California Current Large Marine Ecosystem, comprised of12S sequences previously available in GenBank supplemented by
hundreds of additional 12S sequences generated during this study.