2.3 Bioinformatic analysis
Raw reads from bacterial and fungal dataset were demultiplexed and
quality filtered using QIIME software (Version1.7.0). Reads with a
quality score <20 and those lacking complete barcode and
primers were excluded from further analysis. Chimeric sequences were
removed using USEARCH software. Subsequently, DADA2 workflow (Callahan,
Sankaran, Fukuyama, McMurdie, & Holmesa, 2016) was used to remove
singletons and doubletons. Both bacterial and fungal datasets were
dereplicated to generate the amplicon sequence variants (ASVs).
Taxonomy was assigned to bacterial and fungal ASVs using Naïve Bayes
approach with minimum 75 bootstrap calls following DADA2 workflow
(Callahan et al., 2016) against SILVA version 132 (Quast et al., 2013)
and UNITE general FASTA release for Fungi version 8.0 (Nilsson et al.,
2019), respectively. For bacterial dataset, those ASVs that were not
assigned to bacterial genus, were clustered into different operational
taxonomic units (OTUs) based on 97% similarity with function otuin “kmer” package (Wilkinson, 2018). One random sequence was selected
from each OTU, and assigned based on SILVA references following the
above method. Then the taxonomy assignments of bacterial ASVs and OTUs
were combined to an overall bacterial taxon-sample table. All ASVs and
OTUs that were assigned to non-bacteria, Cyanobacteria phylum, or
Rickettsiales order were removed from the overall taxon-sample table.
The ASVs or OTUs in the overall table were then agglomerated at the
bacterial genus level with identical assignments using “phyloseq”
package as described in DADA2 workflow (Callahan et al., 2016). For
fungal dataset, those ASVs that were not assigned to fungal species,
were similarly clustered, identified based on UNITE references for
eukaryotes version 8.0 (Nilsson et al., 2019), generated fungal
taxon-sample dataset, filtered off non-fungal taxa, and agglomerated
fungal taxa at species level with DADA2 workflow mentioned above. Please
see supplementary file 1 and Fig. S1 for more details and schematic
diagram.
The ASV and OTU sequences have been deposited in GenBank of National
Center for Biotechnology Information under the accession numbers
KEBK00000000 (bacterial 16S sequences) and MT351182-MT353643 (fungal
ITS1 sequences) and the raw sequences in the Sequence Read Archive of
NCBI under BioProject PRJNA625640 (for bacterial data) and PRJNA613597
(for fungal data).