Figure and Table Legends
Figure 1. Workflow of the BOLD project demonstrating the acquisition and fates of contaminant and non-contaminant COIbarcoding sequences.
Figure 2. Cladogram of the maximum likelihood (ML) tree of 1,126 proteobacteria COI contaminants retrieved from a BOLD project incorporating 184,585 arthropod specimens. The tree is based on 561 bp and is rooted with the free-living alphaproteobacteriaPelagibacter ubique . Parantheses indicate the number of BOLD contaminants present in each group. Tips are labelled by BOLD processing ID and host arthropod taxonomy. No colour=Non-BOLD Reference. The Rickettsiales sequences of Anaplasma , Neorickettsia ,Rickettsia and Wolbachia supergroups (A, B, E, F and H) are included as references (Accession numbers: Table S1).
Figure 3. Cladogram of a maximum likelihood (ML) tree of 753COI Rickettsia contaminants retrieved from a BOLD project incorporating 184,585 arthropod specimens. The tree is based on 561 bp and is rooted by the Rickettsia endosymbiont ofIchthyophthirius multifiliis (Candidatus Megaira) using the TVM+F+I+G4 model. Parantheses indicate the number of BOLD contaminants present in Torix and non-Torix Rickettsia groups. Tips are labelled by BOLD processing ID and host arthropod taxonomy. No colour=Non-BOLD reference sequence unless designated by a circle (Dermaptera), star (Diplopoda), triangle (Thysanoptera). TheRickettsia groups: Spotted fever, Transitional, Belli, Typhus, Rhyzobius and Torix are included as references (Accession numbers: Table S1).
Figure 4. Phylogram of the maximum likelihood (ML) tree of 99COI Rickettsia contaminants (prefix “BIOUG”) used for further phylogenetic analysis and 53 Non-BOLD reference profiles (Accession numbers: Table S1). The tree is based on the concatenation of 4 loci; 16S rRNA , 17KDa , gltA and COI under a partition model, with profiles containing at least 3 out of 4 sites included in the tree (2,834 bp total) and is rooted byRickettsia endosymbiont of Ichthyophthirius multifiliis(Candidatus Megaira). Tips are labelled by host arthropod taxonomy.
Figure 5. 16S rRNA and gltA concatenated maximum likelihood (ML) phylogram (1,834 bp total) including Rickettsiahosts from SRA (Triangles) and targeted screens (Stars). The TIM3+F+R2 (16S) and K3Pu+F+G4 (gltA) models were chosen as best fitting models. Rooting is with Orientia tsutsugamushi . Accession numbers found in Table S1.
Figure 6. Phylogram of a maximum likelihood (ML) tree ofCOI Rickettsia contaminants (prefix “BIOUG”) giving a host barcode and 43 Non-BOLD reference profiles. The tree is based on 4 loci; 16S rRNA , 17KDa , gltA and COI under a partition model with profiles containing at least 2 out of 4 sites included in the tree (2,781 bp total) and is rooted by theRickettsia endosymbiont of Ichthyophthirius multifiliis(Candidatus Megaira). The habitats and lifestyles of the host are given to the right of the phylogeny. Accession numbers found in Table S1.
Figure S1. Collection sites of the 753 COIRickettsia contaminants retrieved from BOLD projects.
Figure S2. Phylogram of a maximum likelihood (ML) tree ofCOI Rickettsia found in the NCBI database erroneously identified as mtDNA barcodes based on 577 bp. The HKY+F+G4 model was chosen as the best fitting model using Modelfinder with the Bayesian information criterion (BIC) (Kalyaanamoorthy et al., 2017).
Table 1.1 Targeted Rickettsia screen of aquatic invertebrates. A species was deemed positive through PCR and designated to Rickettsia group after Sanger sequencing and phylogenetic placement (Figure 5). All strains belong to the Torix group.
Table 1.2 . Targeted Rickettsia screen of terrestrial invertebrates. A species was deemed positive through PCR and designated to Rickettsia group after Sanger sequencing and phylogenetic placement (Figure 5). All strains belong to the Torix group except †=Rhyzobius and ‡=Belli.
Table 2. Torix Rickettsia hosts known to date alongside screening method. Bold entries indicate hosts identified in this study. FISH=fluoresence in-situ hybridisation; TEM=transmission electron microscopy; SRA=sequence read archive.
Table S1. Accession numbers used for phylogenetic analyses (Figures 2, 3, 4 ,5 and 6). Accession numbers generated in this study are marked in BOLD.
Table S2. Mitochondrial COI and bacterial gene primers used for re-barcoding and multilocus phylogenetic analysis.
Table S3. List of SRA datasets analysed with phyloFlash and Kraken2.
Table S4. BOLD contaminant datasets
Table S5. Primer pairs involved in the unintended amplification of 753 Rickettsia COI from BOLD project.
Table S6.1. Homology of Rickettsia groups andWolbachia to the most common forward primer (C_LepFolF) attributed to bacterial COI amplification from arthropod DNA extracts.
Table S6.2. Homology of Rickettsia groups andWolbachia to the most common forward reverse (C_LepFolR) attributed to bacterial COI amplification from arthropod DNA extracts.
Table S7. Re-barcoding status and nearest BLAST hit (NCBI) of mtDNA COI arthropod DNA extracts accessed for further analysis, along with the success of multilocus Rickettsia profiles with allocated Rickettsia group (based on phylogenetic analysis) and co-infection status.
Table S8. The barcoding success rate of taxa which gave at least one bacteria COI inadvertent amplification (N=51,475 accessible specimens) with an adjusted Rickettsia prevalence based on an estimated total number of arthropods to account for inaccessible specimens (N=184,585).
Table S9. NCBI matches mistaken for true mtDNA barcodes and their homology to Rickettsia COI (Accessed 29th June 2020).