Signatures of recurrent selection
We filtered the total VCF with annotations by SNPeff and retained only
non-synonymous (replacement) or synonymous (silent) SNPs. We then
compared these polymorphisms to the differences identified to D.
falleni and D. phalerata to polarize changes to specific
branches. Specifically, we sought to determine sites which are
polymorphic in our D. innubila populations or are substitutions
which fixed along the D. innubila branch of the phylogeny. We
used the counts of fixed and polymorphic silent and replacement sites
per gene to estimate McDonald-Kreitman-based statistics, specifically
direction of selection (DoS) (McDonald and Kreitman 1991;
Smith and Eyre-Walker 2002; Stoletzki and Eyre-Walker
2011). We also used these values in SnIPRE (Eilertson et
al. 2012), which reframes McDonald-Kreitman based statistics as a
linear model, taking into account the total number of non-synonymous and
synonymous mutations occurring in user defined categories to predict the
expected number of these substitutions and calculate a selection effect
relative to the observed and expected number of mutations
(Eilertson et al. 2012). We calculated the SnIPRE
selection effect for each gene using the total number of mutations on
the chromosome of the focal gene. Using FlyBase gene ontologies
(Gramates et al. 2017), we sorted each gene into a
category of immune gene or classed it as a background gene, allowing a
gene to be classed in multiple immune categories. We fit a GLM to
identify functional categories with excessively high estimates of
adaptation, considering multiple covariates:
\begin{equation}
Statistic\ \sim\ Population+Gene\ group+\left(Gene\ group*Population\right)+Chromosome+Chromosome:Position\nonumber \\
\end{equation}We then calculated the difference in each statistic between our focal
immune genes and a randomly sampled nearby (within 100kbp) background
gene, finding the average of these differences for each immune category
over 10000 replicates, based on (Chapman et al. 2019).
To confirm these results, we also used AsymptoticMK (Haller and
Messer 2017) to calculate asymptotic α for each gene category. We
generated the non-synonymous and synonymous site frequency spectrum for
each gene category, which we then used in AsymptoticMK to calculate
asymptotic α and a 95% confidence interval. We then used a permutation
test to assess if functional categories of interest showed a significant
difference in asymptotic α from the rest of categories.