Figure and Table Legends
Figure 1. Workflow of the BOLD project demonstrating the
acquisition and fates of contaminant and non-contaminant COIbarcoding sequences.
Figure 2. Cladogram of the maximum likelihood (ML) tree of
1,126 proteobacteria COI contaminants retrieved from a BOLD
project incorporating 184,585 arthropod specimens. The tree is based on
561 bp and is rooted with the free-living alphaproteobacteriaPelagibacter ubique . Parantheses indicate the number of BOLD
contaminants present in each group. Tips are labelled by BOLD processing
ID and host arthropod taxonomy. No colour=Non-BOLD Reference. The
Rickettsiales sequences of Anaplasma , Neorickettsia ,Rickettsia and Wolbachia supergroups (A, B, E, F and H)
are included as references (Accession numbers: Table S1).
Figure 3. Cladogram of a maximum likelihood (ML) tree of 753COI Rickettsia contaminants retrieved from a BOLD project
incorporating 184,585 arthropod specimens. The tree is based on 561 bp
and is rooted by the Rickettsia endosymbiont ofIchthyophthirius multifiliis (Candidatus Megaira) using the
TVM+F+I+G4 model. Parantheses indicate the number of BOLD contaminants
present in Torix and non-Torix Rickettsia groups. Tips are
labelled by BOLD processing ID and host arthropod taxonomy. No
colour=Non-BOLD reference sequence unless designated by a circle
(Dermaptera), star (Diplopoda), triangle (Thysanoptera). TheRickettsia groups: Spotted fever, Transitional, Belli, Typhus,
Rhyzobius and Torix are included as references (Accession numbers: Table
S1).
Figure 4. Phylogram of the maximum likelihood (ML) tree of 99COI Rickettsia contaminants (prefix “BIOUG”) used for
further phylogenetic analysis and 53 Non-BOLD reference profiles
(Accession numbers: Table S1). The tree is based on the concatenation of
4 loci; 16S rRNA , 17KDa , gltA and COI under
a partition model, with profiles containing at least 3 out of 4
sites included in the tree (2,834 bp total) and is rooted byRickettsia endosymbiont of Ichthyophthirius multifiliis(Candidatus Megaira). Tips are labelled by host arthropod taxonomy.
Figure 5. 16S rRNA and gltA concatenated maximum
likelihood (ML) phylogram (1,834 bp total) including Rickettsiahosts from SRA (Triangles) and targeted screens (Stars). The TIM3+F+R2
(16S) and K3Pu+F+G4 (gltA) models were chosen as best fitting models.
Rooting is with Orientia tsutsugamushi . Accession numbers found
in Table S1.
Figure 6. Phylogram of a maximum likelihood (ML) tree ofCOI Rickettsia contaminants (prefix “BIOUG”) giving a
host barcode and 43 Non-BOLD reference profiles. The tree is based on 4
loci; 16S rRNA , 17KDa , gltA and COI under a
partition model with profiles containing at least 2 out of 4 sites
included in the tree (2,781 bp total) and is rooted by theRickettsia endosymbiont of Ichthyophthirius multifiliis(Candidatus Megaira). The habitats and lifestyles of the host are given
to the right of the phylogeny. Accession numbers found in Table S1.
Figure S1. Collection sites of the 753 COIRickettsia contaminants retrieved from BOLD projects.
Figure S2. Phylogram of a maximum likelihood (ML) tree ofCOI Rickettsia found in the NCBI database erroneously
identified as mtDNA barcodes based on 577 bp. The HKY+F+G4 model was
chosen as the best fitting model using Modelfinder with the Bayesian
information criterion (BIC) (Kalyaanamoorthy et al., 2017).
Table 1.1 Targeted Rickettsia screen of aquatic
invertebrates. A species was deemed positive through PCR and designated
to Rickettsia group after Sanger sequencing and phylogenetic
placement (Figure 5). All strains belong to the Torix group.
Table 1.2 . Targeted Rickettsia screen of terrestrial
invertebrates. A species was deemed positive through PCR and designated
to Rickettsia group after Sanger sequencing and phylogenetic
placement (Figure 5). All strains belong to the Torix group except
†=Rhyzobius and ‡=Belli.
Table 2. Torix Rickettsia hosts known to date alongside
screening method. Bold entries indicate hosts identified in this study.
FISH=fluoresence in-situ hybridisation; TEM=transmission electron
microscopy; SRA=sequence read archive.
Table S1. Accession numbers used for phylogenetic analyses
(Figures 2, 3, 4 ,5 and 6). Accession numbers generated in this study
are marked in BOLD.
Table S2. Mitochondrial COI and bacterial gene primers
used for re-barcoding and multilocus phylogenetic analysis.
Table S3. List of SRA datasets analysed with phyloFlash and
Kraken2.
Table S4. BOLD contaminant datasets
Table S5. Primer pairs involved in the unintended amplification
of 753 Rickettsia COI from BOLD project.
Table S6.1. Homology of Rickettsia groups andWolbachia to the most common forward primer (C_LepFolF)
attributed to bacterial COI amplification from arthropod DNA
extracts.
Table S6.2. Homology of Rickettsia groups andWolbachia to the most common forward reverse (C_LepFolR)
attributed to bacterial COI amplification from arthropod DNA
extracts.
Table S7. Re-barcoding status and nearest BLAST hit (NCBI) of
mtDNA COI arthropod DNA extracts accessed for further analysis,
along with the success of multilocus Rickettsia profiles with
allocated Rickettsia group (based on phylogenetic analysis) and
co-infection status.
Table S8. The barcoding success rate of taxa which gave at
least one bacteria COI inadvertent amplification (N=51,475
accessible specimens) with an adjusted Rickettsia prevalence
based on an estimated total number of arthropods to account for
inaccessible specimens (N=184,585).
Table S9. NCBI matches mistaken for true mtDNA barcodes and
their homology to Rickettsia COI (Accessed
29th June 2020).