Supplementary Results
We characterized the repetitive content across our samples using dnaPipeTE (Goubert et al. 2015) and called TE insertions per line using PopoolationTE2 (Kofler et al. 2016). The reference D. innubila genome contains 154 different TE families along with varying satellites and simples repeats, with resequenced individuals varying from 4.4% to 38.4% of reads matching repetitive sequences. Strains varied from 1913 to 7479 TE insertions per strain in the non-repetitive portion of the genome. Like nuclear polymorphism, we find little population structure by shared TE insertions, though strains do seem to disperse primarily by the number of insertions (Supplementary Figure 11B).
Similar to D. melanogaster (Charlesworth and Langley 1989; Charlesworth et al. 1997; Petrov et al. 2011; Kofler et al. 2012; Kofler et al. 2015), D. innubila harbors a significant excess of low frequency TE insertions compared to the SFS of synonymous variants (Supplementary Figure 11A, GLM Count ~ Frequency * SNP or TE, t-value = -16.401, p-value = 1.889e-60), with no difference in the insertion frequency spectra between populations (GLM Count ~ Frequency * TE order * Population, t-value = -0.341, p-value = 0.733). This implies in every population, TE insertions are on average mildly deleterious and removed via purifying selection.
Using dnaPipeTE (Goubert et al. 2015), we find a significantly higher density of RC & TIR elements compared to other repeat orders (Supplementary Figure 11C, t-value = 3.555 p-value = 3.745e-04), consistent with the reference genome (Hill et al. 2019). The density of repetitive content is also higher genome wide in the CH and PR populations compared to HU and SR (Supplementary Figure 11C & D, t-value = 2.856, p-value = 4.291e-03). This is in keeping with a more recent bottleneck for these species reducing effective population size and efficacy of selection, resulting in bursts of repeat activity with relaxed selection for removal of insertions. These changes are primarily driven by an expansion of simple repeats in the CH population (Supplementary Figure 11D, GLM t-value = 3.978, p-value = 7.31e-05) and an expansion of TIR elements in the PR population (Supplementary Figure 11D, GLM t-value = 3.914, p-value = 9.52e-05). Specifically, we see expansions of the satellite CASAT_HD (GLM t-value = 5.554, p-value = 8.832e-08) and the simple repeat sequences CAACAA, CTC and GTGT in the CH population when compared to all other populations (GLM t-value = 9.204, p-value = 2.555e-17). In the PR population we find significantly higher abundances of a TE families closely related toTetris_Dvir (GLM t-value = 13.641, p-value = 2.889e-32),Helitron-2N1_DVir (GLM t-value = 12.381, p-value = 2.789e-28) and Chapaev3-1_PM (GLM t-value = 11.472, p-value = 1.662e-24) compared to other populations. We do not find any evidence that particular TE orders are more abundant on any one chromosome in D. innubila (GLM t-value = 1.854, p-value = 0.633), though do find TEs are at significantly higher insertion densities in the inverted regions of Muller element A than at the regions of the genome (Wilcoxon Rank Sum Test W= 19763, p-value = 0.01488). This suggests the lack of recombination in the inverted region is allowing the accumulation of repetitive content on Muller element A.
TE insertions are usually assumed to be at least mildly deleterious (Charlesworth and Langley 1989; Petrov et al.2011). In D. innubila , TE density is lower in regions flanking genes or within genes compared to non-coding regions (GLM t-value = -6.538, p-value = 6.23e-11), consistent with the deleterious assumption. However, the frequency of TE insertions was significantly higher in exonic regions compared to introns and UTRs (Supplementary Figure 11A, GLM t-value = 4.040, p-value = 5.34e-05), across all populations, which we may have observed as these are wild caught flies and so may have more recessive deleterious insertions segregating in the population than are seen in inbred samples. Overall the repetitive content inDrosophila innubila appears to be mildly deleterious, with TE insertions shared between locations by migration. Despite this there are some major differences in the repeat content of each population, possibly due to the stochastic effect of population bottlenecks.
This may have occurred due to a founder effect following the population bottleneck, where a majority of CH founders by chance had a higher proportion of particular satellites or simple repeats (Charlesworth et al. 2003), but this is unlikely given the gene flow between populations. Alternatively, the bottleneck could have fixed segregating recessive variation which limits the regulation of repetitive content in the genome, leading to its expansion. However, if this was the case and satellite expansion is even mildly deleterious, we would expect migratory rescue of repeat regulation machinery. A third possibility is that satellite expansion is associated with local evolutionary dynamics either involved in adaptation or genetic conflict (Garrido-Ramos 2017; Lower et al. 2018).
Supplementary Figure 1: A. Population size history ofDrosophila innubila backwards in time for each population.B. Population size history on the Log10 scale ofDrosophila innubila backwards in time for each population.C. Results of Structure software (Falush et al.2003) for estimating population structure between locations for 100,000 sampled synonymous polymorphisms from all autosomes, with a K=3 (estimated optimal K value). Note that this plot summarizes all autosomes (excluding Muller B) and the X chromosome due to very little structure between locations for all chromosomes. D. Results of Structure software (Falush et al. 2003) for estimating population structure between locations for 16 polymorphisms on the autosomes, with a K=3 (estimated optimal K value).