Bioinformatic processing
The total number of sequences generated for each primer set ranged
between 2 786 114 and 3 845 487 (ESM Table 1). One sample (ZVLA_B)
received very low read numbers for all primer sets. Primer set B yielded
the highest number of raw reads, but almost half of the reads were lost
during the filtering step. For primer sets A and E, relatively few reads
were lost during filtering, denoising and chimera removal, resulting in
more than 2.5M reads for further processing (ESM Table 1, ESM fig 1).
Read numbers were comparable between bulk and ethanol samples for each
primer set except for primer set A, where approximately three times more
reads were obtained for the bulk samples compared to the ethanol samples
(ESM Fig2A). Nevertheless, a comparable number of ASVs was obtained for
the ethanol and bulk samples with primer set A (average of 149 and 132
respectively, ESM table 1, ESM Fig2B). For primer sets C and E, more
ASVs were found in the ethanol than in the bulk samples (ESM Fig2B). The
total number of ASVs generated across the 24 samples substantially
varied between primer sets and was lowest for primer sets A and E (2139,
22151, 14813, 15211 and 5230 ASVs for primer sets A, B, C, D and E,
respectively) (ESM Fig3).
The percentage of ASVs that were assigned taxonomy using our custom COI
reference database was low (22.6%, 12.1%, 11.7%, 4% and 10.9% for
primer sets A,B,C,D and E respectively; ESM Table 2). However, for
primer set A, the 1655 unassigned ASVs represented only 13.4% of the
total number of non-chimeric reads generated for that primer set. This
percentage was considerably higher for the other primer sets and ranged
between 38.4 and 81.7% (ESM Table 2). Phylum level assignments were
comparable between the ethanol and bulk samples for primer sets B, C and
D, while more ASVs from the bulk samples were assigned to phyla compared
to the ethanol samples for primer sets A (bulk: 30%, ethanol: 20%) and
D (bulk: 20%, ethanol: 9%) (Fig 2). At the species level, taxonomic
assignment of the bulk samples was highest for primer set A (25 %, 9
%, 10 %, 3 % and 16 % for primer sets A,B,C,D and E, respectively).
When using the COI Midori dataset for taxonomic assignment of the
unassigned sequences, only a small fraction were assigned at the phylum
level (8.6%, 0.6%, 1.1%, 5.7% and 5.8% for primer sets A, B, C, D
and E, respectively), and these assigned ASVs represented 5.3%, 0.5%,
0.8%, 6.0% and 20.6% of the total non-chimeric reads for primersets
A,B,C,D and E, respectively. For primer sets A and E, most reads were
assigned to the cnidarian Obelia bidentata (1.3% and 5.1%,
respectively) which was found in all ethanol samples, in all bulk
samples of location 120 and 330 and in very low abundance in one bulk
sample of location 840 (37 and 22 reads, respectively). A detailed list
of all species detected after taxonomic assignment with the Midori
dataset for each primer set is available in ESM Table 3. To investigate
whether the unassigned ASVs after Midori were of non-metazoan origin, a
blastn search was done for primer set A against the nt database of NCBI.
This resulted in only 83 of the 1471 unassigned ASVs recieving a
reliable assignment (query coverage >50, % identity
>90), representing 58 species. All species had low read
numbers, except for Limecola balthica , which was detected in the
bulk DNA of two replicates of the Limecola balthica community
(ZVL) with more than 10 000 reads (ESM Table 4). The non-metazoan taxa
were represented by three fungal, two bacterial, five Viridiplantae and
27 algal or diatom species which all together only represented 0.3% of
the total non-chimeric reads (ESM Table 4).