Bioinformatic processing
Demultiplexed reads were provided by the sequencing company (Admera health Biopharma Services) and were checked for quality using MultiQC (Ewels, Magnusson, Lundin, & Kaller, 2016). Forward and reverse primers were removed using Trimmomatic (Bolger, Lohse, & Usadel, 2014). Reads were further processed using the Dada2 pipeline in the Dada2 v1.12.1 package (Callahan et al., 2016) in R v3.6.2 (Team, 2019) to obtain amplicon sequence variants. Standard filtering parameters were used, except for the maximum number of errors allowed in a read, which was set at 3. Reads were further trimmed to remove parts with a quality score < 30 while keeping at least 5 bp overlap. For each sample and each primer set, unique reads were determined, merged and filtered for chimera’s. Taxonomy was assigned using a custom made reference database containing in house and Bold COI sequences from macrobenthic species that have been found during monitoring campaigns in the Belgian part of the North Sea over the last ten years. This reference database contained 346 Sanger COI sequences from 306 species. The newly generated COI sequences have been uploaded to BOLD and are part of a larger study to build a COI reference database for macrobenthos from the whole North Sea region (https://northsearegion.eu/geans/). Taxonomy was assigned with the naïve Bayesian classifier (Wang, Garrity, Tiedje, & Cole, 2007) with the number of bootstraps set at 80 (minBoot = 80). Barplots were created in R to visualize the number of reads, ASVs and the percentage of ASVs with assigned taxonomy for the ethanol and bulk samples for each primer set. ASVs that did not receive a taxonomic assignment at the phylum level using our custom reference database were extracted and taxonomic assignment was repeated as above but now using the MIDORI_UNIQUE_COI_MARINE_20180221 reference dataset (Machida et al., 2017) downloaded fromhttp://genoweb.toulouse.inra.fr/frogs_databanks/assignation/to ensure that the lack taxonomic assignment was not caused by the reference database used. Only a small fraction of the data was additionally assigned taxonomy when using the Midori dataset (see results), so all further comparisons were made using taxonomic assignments with our custom made reference database because it has been shown that smaller training datasets tailored to the taxa and geographic region of interest yield better results for genus and species level assignments than using the largest possible database (Macheriotou et al., 2019; Ritari, Salojarvi, Lahti, & de Vos, 2015). For primer set A, unassigned ASVs after MIDORI were matched against the nt database of NCBI using Blastn to check whether these unassigned ASVs were from non-metazoan origin. Taxonomic assignments with qcov >50 and pident > 90 were considered a reliable hit.