4. Discussion
We explored the impact of the number of PCR
replicates and read sampling depth, two common parameters in eDNA
experimental design, on estimates of taxonomic diversity. Using eDNA
extracts from six sites at three ecologically and geographically
distinct locations, we performed 24 PCR replicates for each of
metabarcodes: plant ITS (PITS) and fungal ITS (FITS). We then analyzed
these replicates by compiling data sets that included different read
sampling depths and minimum read cutoffs. We find that PCR replicates
are consistent in the composition (Figure 7) and relative abundance (RA)
(Figures 3 and S1) of high abundance taxa, but inconsistent in recovery
of low abundance taxa, and that even large numbers of PCR replicates are
insufficient to fully characterize diversity at any site.
When considering only high abundance taxa, our PCR replicates produced
community profiles that distinguished sites from each other, even sites
that are geographically proximate and presumably similar in community
composition (Figure 4). The majority of high abundance taxa were
detected in all 24 PCR replicates, excluding outlier PCRs. This result
provides empirical support for the modeling-based prediction by Ficetola
et al (2015) that PCR replicates will consistently detect taxa that have
high “detection probability”, which they define as taxa present in
high abundance relative to other taxa at a site. We also observed that
the number of reads assigned to a taxon within a PCR was positively
correlated to the frequency with which that taxon was observed across
PCR replicates (Figures 3, 7), which was also reported by Smith and Peay
(2014). Together, these results confirm that community profiles based on
high abundance taxa are replicable among PCRs and capable of
distinguishing sites. Minimal PCR replication is therefore necessary to
characterize sites using β diversity statistics that derive from high
abundance taxa.
While high abundance taxa were recovered consistently among our PCR
replicates, low abundance taxa were not (Figures 1, 2, 6). Low abundance
taxa rarely occurred in PCR replicates; we observed some low abundance
taxa in several PCR replicates but most in only a single replicate.
Unsurprisingly, this stochasticity in recovery of low abundance affected
biodiversity statistics that rely on raw taxon counts, such as α
diversity. While this observation has been reported previously (e.g.
Beentjes et al., 2019; Ficetola et al., 2008) our results highlight how
the problem can be exacerbated by shallow sequencing read depths and low
minimum read cutoffs. Specifically, we find several fold differences in
maximum richness and rarefied richness among replicates depending on
what values we selected for read depth and minimum read cutoff. While
the stochasticity in recovery of low abundance taxa poses challenges in
interpretation of some biodiversity statistics, it tends not to
influence β diversity between sites measured as either presence-absence
(Figure 4) or RA (Figure S1), or on position with a PCoA.
At our six sites, 24 PCR replicates were not sufficient to detect
all rare taxa and therefore stabilize the species accumulation curves
(Figure 5). This result supports previous observations that using
different numbers of PCR replicates will alter taxonomic profiles
(Alberdi et al. 2017; Murray, Coghlan, and Bunce 2015). Intriguingly,
Smith and Peay (2014) reported the opposite conclusion: that increasing
the number of PCR replicates does not influence α diversity. As is
common in eDNA studies, Smith and Peay amplify each of their PCRs over
30 cycles, whereas we estimated the optimal number of cycles for each
reaction separately using qPCR, following Murray, Coghlan, and Bunce
(2015). Overamplification of PCR amplicon pools can reduce the
complexity of the amplicon pool as read “species” that replicate more
efficiently outcompete others that replicate less efficiently (Nichols
et al., 2017). Consequently, taxa that are least efficiently amplified
will become increasingly rare and may not be observed, in particular at
low read sampling depths.
We observed most singleton taxa in only one PCR replicate (Figure 6).
This finding supports the conclusion by Leray and Knowlton (2017) that
random sampling of rare taxa across PCR replicates accounts for most of
the variation between PCR replicates. Increasing read sampling depth did
not reduce the number of replicates that were required to stabilize the
taxon accumulation curve (Table 2). However, increasing the minimum read
cutoff did reduce the number of PCR replicates necessary to stabilize
the curve (Table 2), presumably by removing many of the low abundance
taxa from each data set such that only the high abundance taxa, most of
which were common to each PCR, remained.
We found that increasing the read sampling depth significantly
increased the number of taxa detected at each of our sites (Figures 1,
2; Table S4). As many eDNA studies and consortia sequence amplicon pools
to the shallowest of our depths (1,000 reads), this result has
implications for how biodiversity estimates based on these published
data sets can be interpreted and compared. The impact of this parameter
choice depends on how the data are analyzed. For example, we estimated
significantly higher observed α diversity at a depth of 10,000 reads
than at a depth of 1,000 reads across all sites, but found no difference
between read depths when α diversity was calculated using the Shannon or
Simpson metrics, which underweight low abundance taxa compared to common
taxa (Hsieh et al., 2016). The significant increase in α diversity that
we observed is in contrast to Murray, Coghlan, and Bunce (2015), who
found that sampling depth per PCR replicate did not necessarily increase
detection of low abundance taxa. This difference may be due to the use
by Murray, Coghlan, and Bunce of Ion Torrent rather than Illumina
sequencing technology, as the higher error profiles generated by the Ion
Torrent platform require more stringent removal of rare taxa (Salipante
et al., 2014). Because sites will vary in the amount of total diversity
present, taxon accumulation curves such as those in Figure 1 may be
useful in determining the appropriate read sampling depth for a given
site.
Our results also reiterate the need to consider the physical and
ecological setting during eDNA experimental design (Anderson et al.,
2012; Ficetola et al. 2015). We observed the most variation in observed
α diversity among PCR replicates in the PITS dataset at YL.1 (Figure 1e
and 2), a lagoon basin into which water and wind carries and deposits
DNA-containing materials from the surrounding environment. The constant
influx of DNA from the surrounding habitats may explain why amplicon
pools from this site include many low abundance taxa. Although these low
abundance taxa have little effect on β diversity estimates, they are
contributing members of local communities. Metabarcoding may therefore
be particularly inefficient tool for estimating and comparing α
diversity at sites with high biological turnover or input.
Because we sequenced each PCR replicate individually, we were also
able to explore the rate of occurrence and potential impact of PCR
outliers, which we define as PCR amplicon pools that differ
significantly in either composition or relative abundance of taxa
compared to other replicates from the same eDNA extract. We found PCR
outliers to be more common at sites with high diversity, like YL.1.
Increasing read sampling depth also increased the frequency of PCR
outliers, but only for the FITS data sets (Figure 4), possibly because
of the higher taxonomic diversity among low abundance taxa recovered by
this metabarcode. While we are unable to determine the precise cause of
outlier PCRs, we note that they are only observable as outliers if more
than one PCR replicate is performed. This rationale is often used by
groups that perform three PCR replicates per sample (Taberlet et al.,
2018), which allows disambiguation between an outlier and a “normal”
PCR without additional laboratory work.
Given our results, we present the following conclusions, which can
serve as recommendations for experimental design in eDNA metabarcoding
experiments:
- PCR Replication: A single PCR often captures the diversity of
common taxa at a site and allows sites to be differentiated based on
these common taxa. However, because outlier PCRs are a possibility, a
minimum of two PCR replicates is recommended. When multiple PCR
replicates are performed, the LCBD statistic can be used to identify
PCR outliers by quantifying replicate uniqueness.
- Read sampling depth: Increasing sequencing read depth
increases the chance that low abundance taxa are recovered from within
the amplicon pool. However, because PCR replicates vary in taxonomic
composition, exhausting the sequence complexity of an amplicon pool
through deep sequencing is not the same as exhausting the sequence
complexity of a DNA extract. Variation between PCR replicates in
taxonomic composition or relative abundance does not diminish with
increased sequencing read depth.
- Minimum read cutoff: Higher minimum read cutoffs remove low
abundance taxa from a PCR amplicon pool. Removed taxa will include
both low abundance contaminants and low abundance authentic taxa. As
such, the minimum read threshold may influence α diversity but is less
likely to influence β diversity.