Different OTU generation methods strongly influence
species richness estimates
Overall community composition is captured well across the three OTU
generation methods when analyzing ecological patterns (Fig. 3) and
relative abundance at the phylum level (Fig. 4). However, the OTU
generation methods differentially capture and represent the members of
these communities, so that different sequences are selected to represent
the raw reads in the different datasets. The dependence on abundant seed
sequences for denoising resulted in fewer OTU_As compared to the two
other methods and entire lineages of rare taxa remained undetected with
this method, while a large number of OTU_As are recovered from abundant
taxa such as Mortierellomycota (Fig. 4b, S9). The detection limits of
different OTU generation methods were compared by generating
approximately genus-level clusters using sequence similarity threasholds
at 90% and species-level clusters at either 99 or 97% across the ITS2
region extracted from all OTU representative sequences. Only 36% of all
genus-level clusters (GH_90) in the dataset were represented by an
OTU_A sequence, compared to 94 and 96% for OTU_C and OTU_S,
respectively (Table 2). The level of detection for SHs represented by up
to 50 reads was lower for OTU_A than the other methods. In some cases,
even close to 300 reads was not enough to detect a SH_99 with OTU_A
(Fig. S10). Even the more inclusive methods did not capture exactly the
same genus-level diversity, with just over 7% of all GH_90 represented
by a sequence recovered by a single method (Table 2). However, no GH_90
was represented only by an OTU_A sequence.
Species richness estimates are heavily influenced by the OTU generation
method used with the lowest numbers estimated with OTU_A for all three
ITS2 sequence similarity levels GH_90, SH_97 and SH_90 (Fig. 5).
While OTU_A richness was estimated to saturate close to 1000 in both
wet and mesic-dry soil conditions (Fig. S4), these may represent only
half as many species since the intraspecies variation is collapsed to
around 600 SH_99 and just over 500 SH_97 (Fig. 5). OTU richness
estimates are highest for OTU_C at almost 1,700 followed by OTU_S at
almost 1400 (Fig. S4), and the numbers are only slightly lower when
estimating species richness as SH_99 (Fig. 5). Accepting ITS2 sequence
similarity at either 99 or 97% as a proxy for species suggests that
clustering into OTU_C or OTU_S detects close to three times as many
species compared to denoising into OTU_A. Of the three methods, OTU_S
is also the method that has the largest number of SH_99 and SH_97
represented by only one OTU (Fig. S11) suggesting that in the current
dataset this method provides the best estimate of species richness.