2.3 Bioinformatics
Sequence reads were processed as stated in our previous study in detail (Leese, Sander et al., 2021). Briefly, JAMP v0.67 (https://github.com/VascoElbrecht/JAMP; Elbrecht et al., 2018) was used on default settings to merge paired-end reads and, where needed, to build the reverse complements of the sequences. Primer sequences were removed. To retain only reads of the expected fragment length, sequences with a deviation of >15 bp were excluded from further analyses. Reads with an expected maximum error of >0.5 and singletons were removed before clustering the sequences with a similarity ≥97 % to OTUs. To maximize the number of reads retained, the dereplicated sequences, including singletons, were mapped with a similarity of ≥97 % to the generated OTU dataset. Only OTUs with a minimal read abundance of 0.01 % in at least one sample were retained for further analyses. OTU centroid sequences were compared to the BOLD database for taxonomic annotation using BOLDigger 1.1.4 (Buchner & Leese, 2020). For further analyses, we only considered OTUs with a similarity of ≥90 % to a reference sequence in BOLD. OTUs with a similarity ≥98% were assigned to species, ≥95 % to genus and ≥90% to family level. Replicates were merged with reads summed up and divided by two for each OTU. OTUs for which conflicting taxonomic results were found were checked manually, taking into account if reference specimens were identified by taxonomic experts. Further, the obtained taxa list was compared to the RMO database, which contains detailed information on morphologically identified taxa occurring in this area, and the taxa list was additionally checked by three taxonomic experts to exclude terrestrial taxa and taxa that are impossible or unlikely to occur in the study area.