3.5- PCR replication and taxon accumulation. 
To assess the extent of low abundance and possibly unique taxa in PCR replicates, we calculated increases in α diversity as PCR replicates are added to a combined data set. Although the order in which PCR replicates are added will not influence cumulative α diversity, the trend to this endpoint will vary. We therefore bootstrapped the analysis 100 times and plotted the mean. Notably, after the addition of all 24 PCR replicates, we did not observe a plateau in species richness, indicating that even this large number of PCR replicates was insufficient to fully sample the diversity of taxa within the DNA extract (Figure 5).
We next calculated the number of PCR replicates per sample needed to reach the point at which the taxon accumulation curve is saturated, or at which it increases by fewer than one taxon on average (based on our bootstrapped analysis) when another PCR replicate is added (Table 2). We performed this analysis at different sequencing read depths and read cutoffs. The number of added PCR replicates necessary to achieve saturation of the taxon accumulation curve varied between sites, sampling read depth, and minimum read cutoff, although fewer replicates were necessary to reach saturation at higher read cutoff (Table 2). Increasing the rarefaction read depth surprisingly increases the number of replicates required (Table 2).
We then plotted histograms of the frequency of taxa detected across PCR replicates (Figure 6). Most taxa are either singletons (present in only one PCR replicate) or occur in all PCR replicates. Based on PITS data, singletons did not appear to be sequencing artefacts because out of the 70 singleton species found, only eleven occurred within the same genus as another species found at high frequency (found in at least 20 replicates) (see Chlamydomonas ; Table S1). To evaluate if singleton taxa were also low relative abundance taxa, we plotted the relationship between a taxon’s within-replicate sequence abundance and its frequency across replicates (Figure 7). For all read depths and minimum read cutoffs, we find a significant positive correlation (Figure 7), as indicated with a fitted linear model (PITS: p<2e-16, T=24.73, adjusted r2= 0.7324; FITS: p<2e-16, T=39.91, adjusted r2=0.8219). This indicates that taxa that occur at low sequence abundance within PCR replicates also occur less frequently across replicates, and that taxa that are abundant within PCR replicates are more likely to occur in all PCR replicates. In PITS results, only when a taxon’s relative abundance is over roughly 10% does it occur in most replicates (Figure 7). In FITS results, only when a taxon is over 1% does it occur in most replicates. Most taxa in soil and sediments were at below 1% relative abundance (Figure 7).