Barcoding success of host taxa
An available subset of attempted barcodes associated with the contaminants contained 55,366 out of 184,585 arthropods originally used in the overall study. The three classes of Insecta (n=49,688), Arachnida (n=3,626) and Collembola (n=1,957), accounted for >99.8% of total specimens (Figure 1). Successful amplification and sequencing of COI was achieved in 43,246 specimens (78.1%) of the genomic extracts, but when assessed at the order level success rates varied (Table S8). The likely explanation for this variation is taxa-specific divergence of sequences at priming sites.
The number of each taxonomic order giving at least one Rickettsiaamplification was then calculated and adjusted based on the total number of specimens in the project to allow for a prevalence estimate. Overall, Hymenoptera, Diptera and Hemiptera were the three taxa most likely to be associated with Rickettsia COI amplification (87.4%). Similarly, on assessment of a subsample from the project where the contaminants originated, a majority (77.7%) of the dataset were also accounted for by these three orders. After adjusting the prevalence to take into account the number of inaccessible specimens, Trichoptera (2.45%), Dermaptera (1.89%) and Psocodea (1.67%) were the most likely taxa to give an inadvertent Rickettsia amplification. Despite Hemiptera and Diptera having a similar estimated prevalence (0.58% and 0.56%), Hemiptera were much more likely to fail to barcode (67.2% vs 93.3%) indicating the true dipteran prevalence is likely to be higher, as a barcoding failure is necessary to amplify non-target bacteriaCOI . Attempts to re-barcode 186 Rickettsia -containing DNA templates of interest from BOLD resulted in 90 successful arthropod host barcodes (Table S7).