Taxonomy assignment and maximum likelihood (ML) tree based on the 5.8S and LSU region
The LSU, 5.8S, ITS2, and full ITS (ITS1–5.8S–ITS2) regions were extracted using LSUx (version 0.99.6; Furneaux et al., 2021) from the OTU consensus sequences generated by all three OTU generation methods. Three datasets were used for assigning taxonomy: the SILVA LSU NR 99 dataset (version 138.1, eukaryotes only; Quast et al., 2012) and RDP fungal LSU training set (version 11; Liu et al., 2012) for the extracted LSU sequences, and the UNITE all-eukaryotes dataset (version 8.2, including singletons; Nilsson et al., 2019) for the extracted ITS sequences. The taxonomic annotations for all three reference datasets were mapped to the UNITE classification system so that assignments from different datasets could be compared (reUnite version 0.2.0, Furneaux et al., 2021). Taxonomy was assigned using the SINTAX algorithm (Edgar, 2016) as implemented in VSEARCH (version 2.15.1; Rognes et al., 2016) with a bootstrap threshold of 0.8.
Unique 5.8S and LSU sequences from the combined (OTU_A, OTU_C, OTU_S) dataset were independently aligned with DECIPHER (version 2.18.0; Wright, 2015). The LSU alignment was truncated at a position corresponding to 879 in the S288C reference sequence due to the presence of introns after this position. The 5.8S and LSU alignments were then concatenated, and each sequence in the concatenated alignment was assigned a unique identifier based on its component 5.8S and LSU sequences. A preliminary ML phylogenetic tree was generated from the concatenated alignment using fasttree (version 2.1.10, Price et al., 2010) with the GTR+C model with 20 rate categories. For sequences assigned at the kingdom level without conflicts between reference databases, the ML tree search was constrained by requiring that each kingdom form a monophyletic clade. Monophyly of the eukaryotic supergroups found in the samples was also constrained (Fig. S1) according to the current consensus of phylogenomic studies (Adl et al., 2019; Strassert et al., 2019). The position of sequences which were not identified to kingdom, or which received conflicting kingdom assignments from the different reference datasets, were not constrained. The tree was rooted with sequences representing the protist phyla Discoba (Supplementary datafile 1).
The clades corresponding to animals (kingdom Metazoa) and vascular plants (phylum Streptophyta) were identified from the tree, and OTUs corresponding to those sequences were removed from further analyses. Additionally, the clade corresponding to kingdom Fungi was extracted and analyzed separately from protists. For kingdom Fungi only, a refined alignment and phylogenetic tree were generated by realignment of the 5.8S and LSU regions using MAFFT-ginsi (Katoh & Standley, 2016), including truncation of the LSU alignment as above, followed by ML phylogeny construction using IQ-TREE (Nguyen et al., 2015; Stamatakis, 2014) using the built-in ModelFinder Plus (Kalyaanamoorthy et al., 2017), which selected the TIM3+F-R10 model, and 1000 ultrafast bootstrap replicates (Hoang et al., 2018). The most abundant Holozoan OTU (across OTU_A, OTU_C, and OTU_S) from the dataset (an Ichthyosporian) was retained to root the fungal tree (Supplementary datafile 2).