Metagenomic Analysis of Microbes
To better understand the microbiome associated with N. riversi , we first examined the taxonomic assignment for the most abundant microbial reads using Autometa (Miller et al. 2019). To generate an input coverage table for Autometa, we used Minimap2 to calculate coverage for the long-read data against the CANU assembly. Then, we ran the Autometa pipeline using a contig length cutoff of 500 bp. While we ran this program with all reads (beetle and non-targeted reads), Autometa separates eukaryotic reads from bacterial reads during the clustering step. We then used taxonomic identification and clustering to identify four unique bins (Spiroplasma, Enterobacter, Acinetobacter, Rhodococcus ). Using BUSCO v4.14 and CheckM (Parks et al. 2015), we assessed the quality of the four bins. Specifically, we ran BUSCO in genome mode using the auto lineage selector (–auto-lineage-prok) and we used the lineage_wf CheckM command. From this analysis we determined that the contigs matching to Spiroplasma had high enough quality to continue with downstream functional analyses. The Spiroplasmasp. NR genome assembly has been made publicly available on NCBI (WGS Accession: CP065192-CP065193).
To understand the function of Spiroplasma in its host, we examined its phylogenetic placement relative to other Spiroplasmataxa, annotated its genome for functional elements, and also examined whether it was transcriptionally active at different life-stages in the RNAseq data. Previous studies have shown that Spiroplasma can play a role in host defense against nematodes and parasitoids through ribosome-inactivating proteins (RIPs, Ballinger & Perlman 2019), or conversely act as a reproductive parasite when the male-killingspaid gene is present (Harumoto & Lemaitre 2018). First, the phylogenetic relationship of Spiroplasma sp. NR was assessed using the 16S rRNA gene. A total of 38 Spiroplasma accessions were obtained from GenBank, and 1,286 bp of the 16S gene were aligned with MUSCLE v3.8.31 (Edgar 2004). An evolutionary tree was inferred using Maximum Likelihood in RAxML (Stamatakis 2014), with the GTRCAT substitution model and 1,000 bootstrap replicates to estimate support values. Next, we annotated the Spiroplasmagenome using prokka (Seemann 2014), with settings for the Bacteria and genetic code 4. To further functionally annotate the genome, we performed several annotations with 1) the KEGG Automatic Annotation Server, 2) antiSMASH v5.0 (Blin et al. 2019), 3) CRISPRfinder (Grissa et al. 2007), and 4) DIAMOND blastx against custom databases and the UniProt SwissProt and TrEMBL databases (accessed September 8, 2020; O’Donovan et al. 2002).We kept only the top 5% of hits (–top 5) that had an E value below 1e-05 for each query sequence. We searched for UniProt accession numbers corresponding to 156 RIPs and the Ankyrin-rich Spaid protein from S. poulsonii (Accession A0A2R6Y5P0). Due to ambiguous results, we also used 1) blastn and blastx against the spaid nucleotide and peptide sequences on GenBank (Accessions MG837001.1 and AWJ64280.1), as well as 2) metabolisHMM (McDaniel et al. 2019) searched using the Ankyrin repeat (PF00023) and OTU-like (PF02338) Pfam hmmer models. The resulting hits were compared to NCBI’s non-redundant nucleotide database using blastx. Finally, to examine ifSpiroplasma was transcriptionally active at different life-stages and in different sexes, we used the Kaiju webserver (Menzelet al. 2016) to assign taxonomic classifications to microbial reads, using default parameters. We recorded the number of reads mapping to domain Bacteria, class Mollicutes, and genus Spiroplasma . In addition, we used kallisto v0.43.1 (Bray et al. 2016) to pseudo-align all non-beetle reads against the Spiroplasma genome from this study.