1- Introduction
Environmental DNA (eDNA) refers to DNA shed by organisms into their
environments (Taberlet et al., 2018), such as soil and other sediments
(e.g.- Deveautour et al., 2018; Ficetola et al., 2018), water (e.g.-
DiBattista et al., 2017; Bista et al., 2017), and air (Kraaijeveld et
al., 2015). Although shotgun sequencing of sedimentary DNA is becoming
increasingly common (e.g. Pederson et al., 2016, Graham et al., 2016),
most eDNA studies to date have targeted either single taxa with
species-specific PCR (“barcoding”; e.g.- Βaker et al., 2018; Franklin
et al., 2019) or phylogenetically diverse taxa using “universal” PCR
primers that bind to conserved regions flanking barcode loci
(“metabarcoding”; e.g.- Taberlet et al., 2012; Valentini et al.,
2016). Metabarcoding has been used, for example, to test hypotheses
about biotic and abiotic drivers of changes in community composition
(Erlandson et al., 2018; Deveautour et al., 2018; Yan et al., 2018,
Giguet‐Covex et al.,
2014,
Epp et al., 2010), including over tens of thousands of years (Willerslev
et al., 2003, 2014; Parducci et al., 2012), and to characterize extant
eukaryotic diversity (Lallias et al., 2015; Leray and Knowlton, 2016).
Because eDNA can be collected non-invasively, eDNA is also a promising
tool for studying rare, cryptic, or endangered species (Franklin et al.,
2019; Laramie et al. 2015; Schnell et al., 2012), and for tracking early
advances of invasive species (Klymus et al., 2017; Xia et al., 2018;
Sjogren et al., 2017).
The broad application of eDNA approaches across environmental science,
in combination with recent advances in DNA processing and sequencing
technologies, has precipitated rapid growth of eDNA as a research field
(Pederson et al. 2015; Garlapati et al., 2019). Indeed, several
interdisciplinary initiatives now use eDNA to characterize biodiversity
across spatial and temporal scales, such as CALeDNA (Meyer et al.,
2019), DNAquanet (https://dnaqua.net/), and Maine-eDNA
(https://umaine.edu/edna/). These initiatives have shown repeatedly that
DNA detection in eDNA samples varies in both specificity and
replicability (Garlapati et al., 2019; Cristescu and Hebert, 2018) and
the causes of this variation have yet to be fully characterized.
Technical biases can be introduced into eDNA experiments during field
sampling, laboratory processing, and bioinformatics (Pederson et al.,
2015). These biases have the potential to influence resulting
biodiversity profiles, and are difficult to decouple from both true
differences among organisms in DNA deposition rates and taphonomic
processes that drive variation in long-term DNA survival (Taberlet et
al., 2018). DNA isolation protocols, for example, are known to influence
which organisms are detected in a sample. Deiner et al. (2018) reported
a three-fold change in observed α diversity depending on filter
material, pore size, and chemical extraction, including the absence
under some extraction conditions of some taxa known to be present.
Choice of polymerase for PCR can also influence biodiversity estimates.
Nichols et al (2017) showed that the composition and relative abundance
of taxa detected in the PCR amplicon pool changed during PCR
amplification as some polymerases biased the amplicon pools toward
sequences with a particular GC content. Differences in amplicon length,
templates secondary structures, and base mismatches at the PCR primer
binding site also affect binding and copying efficiency during PCR
(Fonesca et al., 2012; Elbrecht & Leese, 2015; Krehenwinkel et al.,
2017) also affect binding and copying efficiency during PCR, skewing
post-PCR taxon composition and relative abundance estimates (Pawluczyk
et al., 2015). As the number of eDNA data sets grows, and along with
that the possibility for comparative analysis across data sets, the need
to understand and mitigate these many potential biases grows (Braukmann
et al., 2019; Ruppert et al., 2019).
As one approach to mitigating these potential biases, many eDNA studies
include one or more controls as part of their experimental design. These
controls are intended to quantify some component of potential technical
biases prior to proceeding with biodiversity analyses. For example,
samples taken in the field as replicates can be used to estimate and
account for spatial variation in DNA deposition and survival (Andersen
et al., 2012; Ficetola et al., 2015). Incorporating negative controls
(experimental replicates that include no sample) during both DNA
extraction and PCR can track potential laboratory-introduced
contaminants or other errors. Incorporation positive controls comprising
mixtures of organismal tissue or extracted DNA can confirm that
protocols are working as expected and quantify bias (Port et al., 2016;
Olds et al., 2016). After data are generated, bioinformatic approaches
can also be implemented to detect and mitigate the influence of
experimental biases. For example, site occupancy modeling incorporates
variation among PCR replicates to overcome imperfect detection (Ficetola
et al., 2015), and species richness curves can be estimated to assess
whether sufficient replicates have been performed to detect all or most
of the taxa preset in a sample (Ficetola et al., 2015; Lundberg et al.,
2013; Beentjes et al., 2019).
Variation among PCR replicates is expected in eDNA metabarcoding
(Beentjes et al 2019; Ficetola et al. 2008). While much of this
variation results from random sampling of low abundance taxa (Leray &
Knowlton 2017), variation can also arise due to errors including
contamination by exogenous DNA and the accumulation of replication and
sequencing errors. To capture this variation, eDNA experimental designs
often include one or more replicate PCRs for each DNA extract. Replicate
PCRs reduce the effect of stochastic amplification and make it possible
to detect potential outlier PCRs (Robasky et al 2014; Leray & Knowlton,
2017),
which we define as a PCR amplicon pool that contains a different
richness or composition of taxa than other replicates. Most eDNA studies
perform from one (e.g. Deveautour et al., 2018; Erlandson et al., 2018)
to three PCR replicates (e.g.Yamamoto et al., 2017; Beentjes et al.,
2019; Browne et al., 2020), although more PCR replicates are sometimes
performed when working with ancient samples (e.g. 8 replicates: Ficetola
et al, 2018, Clarke et al., 2019; 15 replicates: Stahlschmidt et al.,
2019). However, while PCR replication is common in eDNA research, there
is no consensus as to how this mechanism should be applied as a control.
Most studies generate replicate PCRs and then pool them prior to
sequencing, rather than sequencing them separately (e.g. Lanzen et al.,
2017; Smith and Peay 2014; Lin et al., submitted ), in part
because it is cost prohibitive (Fonseca 2018). Some studies that
sequence PCR replicates separately discard taxa found in only a single
replicate as an approach to ruling out contaminants (De Barba et al.,
2014;
Giguet‐Covex et al.,
2014;
Hope et al.,
2014),
or require a taxon or haplotype to be present in all replicates to be
included in downstream analysis (Taberlet et al., 2018; Tsuji et al.,
2019). While variation due to errors can artificially inflate
biodiversity estimates (Zepeda-Mendoza 2016), discarding taxa found in
few replicates may also remove biodiversity that is genuinely present in
a sample (Leray & Knowlton,
2017).
Another experimental variable to consider when performing PCR
replication is read sampling depth, or how deeply each PCR amplicon pool
is sequenced. In published eDNA studies, read sampling depth per PCR
replicate tends to be between 1,000 reads (Krehenwinkel et al., 2017)
and 25,000 reads (e.g. Lanzen et al 2017, Schnell et al., 2018; Stat et
al., 2017), with some exceptions (e.g. >100,000 reads per
replicate. Leempoel et al., 2020). Previous studies investigating the
role of PCR replication and read sampling depth in eDNA have had
conflicting results. In exploring metabarcoding data generated using the
Ion Torrent sequencing platform, Murray, Coghlan, and Bunce (2015) found
that increasing read sampling depth did not necessarily increase the
likelihood of detection of low abundance taxa. Smith and Peay (2014),
alternatively, found that increasing read sampling depth decreased
dissimilarity between PCR replicates PCR replication but that this had
little effect on estimates of either α or β diversity.
Several previous studies have explored the influence of PCR replicates
on metabarcoding-based biodiversity estimates, also with conflicting
results. Smith and Peay (2014), for example, compared diversity
estimates generated from PCR amplicons pools comprising 1, 2, 4, 8 or 16
PCR replicates and found that sequencing platform and sequencing depth
both affected recovered taxonomic profiles, but that the number of PCR
replicates did not necessarily increase the number of observed taxa.
Ficetola et al. (2015), using both simulated and empirical data, found
that the number of PCR replicates necessary to observe all taxa in an
extract depends on the site, and that the probability of detection of a
given taxon increases with that taxon’s relative abundance. More
recently, Alberdi et al. (2017) compared biodiversity estimates from
metabarcoding data amplified in triplicate from 54 bat fecal samples.
They observed different taxonomic profiles depending on which subsets of
the PCR replicates per sample they analyzed and that increasing read
sampling depth increased dissimilarity between PCR replicates. While
these results hint that more PCR replicates is better than fewer PCR
replicates in assessing total diversity, how many PCR replicates should
be performed remains an open question, as does the interaction and
potential trade-off between PCR replication and increasing read sampling
depth.
Here, we perform 24 replicate PCRs for two commonly used metabarcodes,
the Internal Transcribed Spacer (ITS ) for fungi
(ITS1 ) and for plants and algae (ITS2 ), to explore how PCR
replication and read sampling depth influence metabarcoding-based
biodiversity estimates at six ecologically distinct sites. We address
explicitly the detection of rare taxa, inference of community
composition, site differentiation based on taxon composition, and the
detection and prevalence of PCR outliers. Our data provide two key
insights for eDNA metabarcoding experimental design. First, we find that
abundant taxa are common among PCR replicates and that these taxa, and
therefore few PCR replicates, are sufficient to define site uniqueness.
Second, we observe that rare taxa most often appear in only one or a few
replicates, and alter significantly richness estimates among replicates.
These results suggest that metabarcoding may be insufficient to
characterize fully the alpha biodiversity at any site, even with large
numbers of replicates, but can be sufficient – even with low read
sampling depth and few replicates – to characterize beta diversity.