4. Discussion
We have investigated whether documented differences in defence phenotypes between three closely related species that are attacked by the same parasitoid wasp species (G. pusilla >G. tenella >G. calmariensis ), are reflected in genomic signatures of positive selection among them. We found that the relative ranking of species by immune performance was predictive of the number of immune genes exhibiting signatures of positive selection. However, this pattern differed depending on the type of test that we used, as well as by the region of the genome analysed by a given test.
Our analysis of the coding region of genes with a modified MK-test based approach showed that higher numbers of immune genes (N = 7) were under positive selection in the species with the strongest capacity to encapsulate wasp eggs (G. pusilla ), compared with the species lacking the capacity to encapsulate wasp eggs (G. calmariensis, N= 1) or having an intermediate encapsulation capacity (G. tenella, N = 4). These findings suggest that evolutionary adaptations of host defences are not only a consequence of changes in regulatory genes but are also due to changes in protein-coding regions. Even though the identified genes have yet to be functionally characterized in these species, their importance in mediating defence against parasitoids was partly confirmed by previous comparative gene expression analyses (Yang et al., 2020). This results contrasts with the pattern from LASSI Plus, where an equal number of immune related genes were identified as outliers in the three species. While this pattern may in part be due to differences in the power of the two methods to detect selection at different time scales, it is also notable that most genes (4/6) identified as outliers in G. pusilla were also previously found as being differentially expressed following wasp attack whereas none of the genes were such for G. calmariensis .
Even though we could analyze only a subset of immune genes in theGalerucella genome (96 out of 166), the detected genes indicate the type of processes most likely under selection. Most importantly, the analysis suggested that the observed phenotypic differences in the capacity to encapsulate wasp eggs (Fors et al., 2014) are likely explained by genes involved in wasp egg detection and in the recruitment of hemocytes building the capsule (Yang et al., 2020), whereas genes involved in melanization or wound healing processes seem to be more conserved. A key observation from the previous gene expression analyses was that attack by A. parviclava increased expression of multiple genes in G. pusilla and almost none in G. calmariensis(Yang et al., 2020). This finding indicates that differences in defense phenotypes between these species can best be explained by genes coding for proteins involved during the early phases of the immune process. The current analysis identified several promising candidates, such assanta-maria , Corin , PGRP-LE , Tak1 andGrass , that have been suggested to be involved in either the recognition or signalling processes. In addition, two of these genes (Grass and PGRP-LE) were are among those identified as outliers in LASSI Plus when including flanking regions. The specific functions for these five genes have not been identified in Galerucella , but some studies exist from Drosophila . First, both santa-maria andCorin have been suggested to encode proteins with scavenger receptor or serine-type endopeptidase activities (Cao & Jiang, 2018), where Corin has a transmembrane domain that may function as a receptor to the Toll pathway (Irving et al., 2001). Second,PGRP-LE encodes an intracellular protein that binds to diaminopimelic acid-type peptidoglycans to activate the IMD/Relish pathway (Bosco-Drayon et al., 2012), but the function may primarily be anti-bacterial (Libert, Chao, Chu, & Pletcher, 2006). Third,Grass has been predicted to be involved in the Toll signalling pathway, which may be the most important signalling pathway in the immune response against wasp attack in Drosophila (Carton et al., 2008; Lemaitre & Hoffmann, 2007; Wertheim et al., 2005), whereasTak1 may be involved in the change between the JNK and IMD pathways. Among these genes, a particularly interesting candidate isCorin , which was under positive selection in both G. pusilla and G. tenella , the two species with relatively more efficient encapsulation responses. Even though G. tenella was not included in the gene expression analysis, we know from other studies that this species has an encapsulation capacity that is intermediate between the two other species (Fors et al., 2016). Previous work onDrosophila show upregulation of Corin following bacterial infections (Irving et al., 2001), but not following parasitoid attack (Salazar Jaramillo et al., 2017). Unfortunately, this gene was data deficient in our gene expression analysis.
These recognition and signaling genes are sufficient for explaining differences in defense phenotypes, but we also detected positive selection on genes involved in downstream regulation of hemocytes. First, cher was under positive selection in both G. tenella and G. pusilla , and has previously been shown to negatively regulate lamellocyte differentiation (Rus et al., 2006). Lamellocytes are key elements in the capsules killing parasitoid eggs ofGalerucella and are induced from precursors after attack byA. parviclava (Fors et al., 2016; Fors et al., 2014), and similarly induced by Drosophila following parasitoid attack (Kim-Jo, Gatti, & Poirié, 2019). Two other genes involved in lamellocyte regulation similarly have evidence for positive selection, one inG. pusilla (zfh1 ) and one on G. tenella(Cyt-b5 ). zfh1 is one element in a transcription factor cascade that plays a role as the switch between plasmatocyte and lamellocyte fate (Frandsen, Gunn, Muratoglu, Fossett, & Newfeld, 2008), and Cyt-b5 encodes a conserved hemoprotein that is required for hemocyte regulation (Kleinhesselink, Conway, Sholer, Huang, & Kimbrell, 2011). These two latter genes were also differentially expressed following wasp infection involving A. parviclava and G. pusilla (Yang et al., 2020).
When examining other gene functions identified as experiencing positive selection, we found that genes coding for imaginal disc pattern formation and wing disc pattern formation were under positive selection in all species. Imaginal discs are epithelial sacs found in insect larva that later develop into cuticular structures (e.g., head, wing, limbs, thorax) of adult insects, and the wing disc is among the largest imaginal discs in insects (Blair, 2009). These findings are not surprising as the beetle species are morphologically differentiated, both in size and colour. Other genes varied between species, but those under positive selection in G. calmariensis specifically involved genes coding for metabolic functions such as the metabolism of carbohydrate derivatives, oligosaccharides, amino sugars, sulphur compounds and catechol-containing compounds. These sets of positively selected gene functions indicate the importance of energy allocation and dietary transitions during evolution in this species. Moreover, pathways related to the nervous systems were also found to be positively selected in G. calmariensis . Positive selection on nervous system-related genes were previously documented in social insects such as bees and ants (Roux et al., 2014; Woodard et al., 2011) but have rarely been reported in beetles. These patterns may indicate changes either in the capacity to detect host plants or mates and may thus be involved in the species differentiation. In G. tenella , several pathways related to lipid metabolic processes were enriched in positively selected genes. Some of these processes, such as sterol metabolism, may be linked to needs to handle differences in plant chemistry from the different host plants (Rosaceae vs. L. salicaria ). Other lipids, such as sphingolipids, have been suggested to be involved in cell defences and could be interesting to study in relation to wasp attack. Finally, a unique pathway under positive selection in G. tenella was involved in the pigmentation metabolic process, which may potentially explain the different spotting and pigmentation patterns in the species.
With regards to our HDPRFMK tests, there are some methodological concerns warranting attention. First, we were only able to include 4,154 orthologous genes in the analysis, which may have reduced the power to detect important GO terms. Whether these removals have biased our conclusions is unclear, but by discarding genes of low coverage and removing bias due to recent gene birth death dynamics, this smaller set of genes provides a robust gene set for inferring selection dynamics and is sufficient for estimating genome wide selection dynamics. Second, a general problem in MK tests is their sensitivity to the demographic history (Eyre-Walker, 2002; McDonald & Kreitman, 1991). In our analysis, we approached this problem by using HDMKPRF, which attempts to correct for demography effects during the detection of selection patterns by accounting for changes in effective population size and other population genetic parameters. One potential concern, however, is that the proportion of substitutions driven by positive selection were highest in G. calmariensis even though this species had the smallest Ne . Third, an additional issue when examining selection patterns is slightly deleterious mutations. If such mutations are segregating in the population, the degree of positive selection will be underestimated. Removing deleterious mutations during selection analysis is therefore widely accepted, by setting a threshold for removing minor alleles (Bierne & Eyre-Walker, 2004; Fay, Wyckoff, & Wu, 2001; Fay et al., 2002). We only removed singleton polymorphisms, which increased the proportion of negatively selected sites but retained the relative ranking of α among species. Thus, while absolute α-estimates in our analysis are certainly underestimated, the relative ranking among species is likely correct.
Our investigation of signatures of positive selection from genome wide analysis in each of the three species failed to reveal an enrichment of immune genes biased towards the most defended species, similar to our predictions and what was observed in the HDPRFMK test. Because we assessed the overlap of the identified sweep regions with both gene bodies, as well as up to 10kb flanking regions as a proxy for gene regulatory regions, these findings are not because we limited our sweep detection to either regulatory or coding regions; we captured putative candidate sweep regions for both. Thus, we conclude that differences in the results from the two tests are more reflective of their inherent differences, which are diverse.
The HDPRFMK test could be considered to be much more conservative than LASSI Plus, as it requires the fixation of multiple amino acid mutations and hence multiple rounds of selective sweeps in order to result in significant departures from neutral expectations. However, because the test uses all such events since a last common ancestor was shared with the other species in the analysis, while conservative, this length of time provides many opportunities for selective sweeps, making it a powerful test, yet narrow in the underlying scenario it can detect. In contrast, LASSI Plus, like nearly all tests reliant upon outliers in the site frequency spectrum, is only able to detect positive selection events on a much more recent time horizon, likely a small fraction of the time since the last common ancestor. While limited in the time horizon such a test can query over, most substantial selective sweeps are likely to be detected and these are identifiable nearly anywhere in the genome; LASSI Plus should be able to detect one of the many selective sweeps that build up the amino acid fixations underlying HDPRFMK tests, as well as sweeps happening anywhere else. In sum, while the HDPRFMK test is narrowly focused upon selection acting on amino acid variation, it gains power by covering a deep time horizon. LASSI Plus, while able to detect positive selection nearly anywhere in the genome, can only see events in the relatively recent past.