Taxonomy assignment and maximum likelihood (ML) tree based on the
5.8S and LSU region
The LSU, 5.8S, ITS2, and full ITS (ITS1–5.8S–ITS2) regions were
extracted using LSUx (version 0.99.6; Furneaux et al., 2021) from the
OTU consensus sequences generated by all three OTU generation methods.
Three datasets were used for assigning taxonomy: the SILVA LSU NR 99
dataset (version 138.1, eukaryotes only; Quast et al., 2012) and RDP
fungal LSU training set (version 11; Liu et al., 2012) for the extracted
LSU sequences, and the UNITE all-eukaryotes dataset (version 8.2,
including singletons; Nilsson et al., 2019) for the extracted ITS
sequences. The taxonomic annotations for all three reference datasets
were mapped to the UNITE classification system so that assignments from
different datasets could be compared (reUnite version 0.2.0, Furneaux et
al., 2021). Taxonomy was assigned using the SINTAX algorithm (Edgar,
2016) as implemented in VSEARCH (version 2.15.1; Rognes et al., 2016)
with a bootstrap threshold of 0.8.
Unique 5.8S and LSU sequences from the combined (OTU_A, OTU_C, OTU_S)
dataset were independently aligned with DECIPHER (version 2.18.0;
Wright, 2015). The LSU alignment was truncated at a position
corresponding to 879 in the S288C reference sequence due to the presence
of introns after this position. The 5.8S and LSU alignments were then
concatenated, and each sequence in the concatenated alignment was
assigned a unique identifier based on its component 5.8S and LSU
sequences. A preliminary ML phylogenetic tree was generated from the
concatenated alignment using fasttree (version 2.1.10, Price et al.,
2010) with the GTR+C model with 20 rate categories. For sequences
assigned at the kingdom level without conflicts between reference
databases, the ML tree search was constrained by requiring that each
kingdom form a monophyletic clade. Monophyly of the eukaryotic
supergroups found in the samples was also constrained (Fig. S1)
according to the current consensus of phylogenomic studies (Adl et al.,
2019; Strassert et al., 2019). The position of sequences which were not
identified to kingdom, or which received conflicting kingdom assignments
from the different reference datasets, were not constrained. The tree
was rooted with sequences representing the protist phyla Discoba
(Supplementary datafile 1).
The clades corresponding to animals (kingdom Metazoa) and vascular
plants (phylum Streptophyta) were identified from the tree, and OTUs
corresponding to those sequences were removed from further analyses.
Additionally, the clade corresponding to kingdom Fungi was extracted and
analyzed separately from protists. For kingdom Fungi only, a refined
alignment and phylogenetic tree were generated by realignment of the
5.8S and LSU regions using MAFFT-ginsi (Katoh & Standley, 2016),
including truncation of the LSU alignment as above, followed by ML
phylogeny construction using IQ-TREE (Nguyen et al., 2015; Stamatakis,
2014) using the built-in ModelFinder Plus (Kalyaanamoorthy et al.,
2017), which selected the TIM3+F-R10 model, and 1000 ultrafast bootstrap
replicates (Hoang et al., 2018). The most abundant Holozoan OTU (across
OTU_A, OTU_C, and OTU_S) from the dataset (an Ichthyosporian) was
retained to root the fungal tree (Supplementary datafile 2).