2.5 | Bioinformatics processing of sequencing data
The raw ITS2 gene sequencing reads were demultiplexed, quality-filtered by fastp version 0.20.0 (Chen, Zhou, Chen, & Jia, 2018) and merged by FLASH version 1.2.7 (Magoc, & Salzberg, 2011) using the following criteria: (i) The 300 bp reads were truncated at any site receiving an average quality score of <20 over a 50 bp sliding window, and the truncated reads shorter than 50 bp were discarded; reads containing ambiguous characters were also discarded. (ii) Only overlapping sequences longer than 10 bp were assembled according to their overlapped sequence. The maximum mismatch ratio of overlap regions was 0.2. Reads that could not be assembled were discarded. (iii) Samples were distinguished according to the barcode and primers. Exact barcode matching was specified, with a two nucleotides mismatch in primers matching being permitted.
Operational taxonomic units (OTUs) with 97% similarity cutoff were clustered using UPARSE version 7.1 (Edgar, 2013) , and chimeric sequences were identified and removed. The taxonomic identity of OTUs was annotated using a BLAST search of the reference set of OTU sequences against public databases (GenBank and EMBL) using a similarity threshold of >95% for species-level identification. Furthermore, final taxonomic classification was based on the closest blast match as well as other considerations involving the geographical locations of the species and the diversity of closely related species (Deagle, Chiaradia, McInnes, & Jarman, 2010). Sequences were advised to be allocated to a higher taxonomic level (e.g., genus or family) when the same score was assigned to two or more taxa for this sequence.