2.5 | Bioinformatics processing of sequencing data
The raw ITS2 gene sequencing reads were demultiplexed, quality-filtered
by fastp version 0.20.0 (Chen, Zhou, Chen, & Jia, 2018) and merged by
FLASH version 1.2.7 (Magoc, & Salzberg, 2011) using the following
criteria: (i) The 300 bp reads were truncated at any site receiving an
average quality score of <20 over a 50 bp sliding window, and
the truncated reads shorter than 50 bp were discarded; reads containing
ambiguous characters were also discarded. (ii) Only overlapping
sequences longer than 10 bp were assembled according to their overlapped
sequence. The maximum mismatch ratio of overlap regions was 0.2. Reads
that could not be assembled were discarded. (iii)
Samples were distinguished
according to the barcode and primers. Exact barcode matching was
specified, with a two nucleotides mismatch in primers matching being
permitted.
Operational taxonomic units (OTUs) with 97% similarity cutoff were
clustered using UPARSE version 7.1 (Edgar, 2013) , and chimeric
sequences were identified and removed. The taxonomic identity of OTUs
was annotated using a BLAST search of the reference set of OTU sequences
against public databases (GenBank and
EMBL)
using a similarity threshold of >95% for species-level
identification. Furthermore, final taxonomic classification was based on
the closest blast match as well as other considerations involving the
geographical locations of the species and the diversity of closely
related species (Deagle, Chiaradia, McInnes, & Jarman, 2010). Sequences
were advised to be allocated to a higher taxonomic level (e.g., genus or
family) when the same score was assigned to two or more taxa for this
sequence.