2.3 Bioinformatic analysis
Raw reads from bacterial and fungal dataset were demultiplexed and quality filtered using QIIME software (Version1.7.0). Reads with a quality score <20 and those lacking complete barcode and primers were excluded from further analysis. Chimeric sequences were removed using USEARCH software. Subsequently, DADA2 workflow (Callahan, Sankaran, Fukuyama, McMurdie, & Holmesa, 2016) was used to remove singletons and doubletons. Both bacterial and fungal datasets were dereplicated to generate the amplicon sequence variants (ASVs).
Taxonomy was assigned to bacterial and fungal ASVs using Naïve Bayes approach with minimum 75 bootstrap calls following DADA2 workflow (Callahan et al., 2016) against SILVA version 132 (Quast et al., 2013) and UNITE general FASTA release for Fungi version 8.0 (Nilsson et al., 2019), respectively. For bacterial dataset, those ASVs that were not assigned to bacterial genus, were clustered into different operational taxonomic units (OTUs) based on 97% similarity with function otuin “kmer” package (Wilkinson, 2018). One random sequence was selected from each OTU, and assigned based on SILVA references following the above method. Then the taxonomy assignments of bacterial ASVs and OTUs were combined to an overall bacterial taxon-sample table. All ASVs and OTUs that were assigned to non-bacteria, Cyanobacteria phylum, or Rickettsiales order were removed from the overall taxon-sample table. The ASVs or OTUs in the overall table were then agglomerated at the bacterial genus level with identical assignments using “phyloseq” package as described in DADA2 workflow (Callahan et al., 2016). For fungal dataset, those ASVs that were not assigned to fungal species, were similarly clustered, identified based on UNITE references for eukaryotes version 8.0 (Nilsson et al., 2019), generated fungal taxon-sample dataset, filtered off non-fungal taxa, and agglomerated fungal taxa at species level with DADA2 workflow mentioned above. Please see supplementary file 1 and Fig. S1 for more details and schematic diagram.
The ASV and OTU sequences have been deposited in GenBank of National Center for Biotechnology Information under the accession numbers KEBK00000000 (bacterial 16S sequences) and MT351182-MT353643 (fungal ITS1 sequences) and the raw sequences in the Sequence Read Archive of NCBI under BioProject PRJNA625640 (for bacterial data) and PRJNA613597 (for fungal data).