Search of the Sequence Read Archive (SRA) and NCBI
Further insights into the balance of Rickettsia groups within
arthropod symbioses was obtained through searching for Rickettsia
presence in Illumina datasets associated with arthropod whole genome
sequence (WGS) projects in the SRA (60,409 records as of the 20th May
2019). To reduce the bias from over-represented laboratory model species
(e.g. Drosophila spp., Anopheles spp.) a single dataset
per species was examined, and where multiple data sets existed for a
species, that with the largest read counts was retained. The resultant
dataset, representing 1,342 arthropod species (Table S3), was then
screened with phyloFlash
(Gruber-Vodicka,
Seah, & Pruesse, 2019) which finds, extracts and identifies 16S
rRNA sequences.
Reconstructed full 16S rRNA sequences affiliated toRickettsia were extracted and compared to sequences derived from
the targeted screen phylogenetically (see sections above) to assess
group representation within the genus. The microbial composition of all
SRA datasets that did not result in a reconstructed Rickettsia 16S
rRNA with phyloFlash were re-evaluated using Kraken2
(Wood, Lu, &
Langmead, 2019), a k-mer based taxonomic classifier for short DNA
sequences. A cut-off of at least 40k reads assigned to Rickettsiataxa was applied for reporting potential infections (theoretical genome
coverage of ~ 1 – 4X assuming an average genome size of
~1.5Mb).
We also examined NCBI for Rickettsia sequences deposited as
invertebrate COI barcodes. To this end, a BLAST search of TorixRickettsia COI sequences from previous studies (Ceccarelli et
al., 2016; Pilgrim et al., 2017) was conducted on the
29th June 2020. Sequences were initially considered
belonging to the Torix group if their similarity was >90%
and subsequently confirmed phylogenetically as described above.