Importance of regional reference databases
Given that increasing reference database completeness increased the
ability to assign ASV’s to species, it is logical to assume that
databases with more taxonomic coverage are better. However, our results
suggest an unexpected trade-off between greater diversity of barcodes
and regionally/ecologically informed taxonomic assignment. For example,
using only the FishCARD database, which is specific to California
Current marine fishes, we identified important native taxa like Black
croaker (Cheilotrema saturnum ) and Bat ray (Myliobatis
californica ) in eDNA samples. However, when FishCard and theCRUX -12S databases were combined to yield a database with
the largest total number of barcodes, black croaker was not identified
and bat ray inconsistently identified across multiple ASVs. The combined
database failed to identify black croaker due to the high similarity of12S barcode sequences within the Family Sciaenidae, specifically
within the clade that includes Cheilotrema, a genus native to
California, as well as Equetus and Pareques, non-native
coral reef associated genera; Supplemental Table 3). Similarity of
barcode sequences also explains the loss of taxonomic resolution inMyliobatis .
By excluding highly similar non-native barcodes, the curated FishCARD
database provided more accurate species-level assignments, suggesting
that a database comprised of only local taxa is preferred to maximize
identification of local species. However, this improvement was not
universal. For example, FishCARD failed to classify an ASV belonging to
the family Delphinidae that was identified by both the CRUX and combined
databases. This result stems from FishCARD being specific to California
Current fishes and does not include marine mammals. This shortcoming
could be easily overcome, however, by appending FishCARD with barcodes
for other marine-associated vertebrate taxa of local management
interests (Valsecchi et al., 2019).
These results highlight the tradeoff between identifying local species
from clades with little genetic variation and providing taxonomic
coverage across a broad range of vertebrate species. As such,
researchers need to identify their research priorities when deciding on
which reference databases to use, with a particular focus on defining
the scope of the target taxa. Future work could alleviate this tradeoff
by building bioinformatic pipelines that prioritize assignments to a
reference set of native species, perhaps by including information on
species ranges and sample locations in the assignment algorithm.
Alternatively, a regional database could be appended to address specific
questions, such as testing for the presence of specific invasive species
or range shifts associated with climate change.