Supplementary Results
We characterized the repetitive content across our samples using
dnaPipeTE (Goubert et al. 2015) and called TE insertions
per line using PopoolationTE2 (Kofler et al. 2016). The
reference D. innubila genome contains 154 different TE families
along with varying satellites and simples repeats, with resequenced
individuals varying from 4.4% to 38.4% of reads matching repetitive
sequences. Strains varied from 1913 to 7479 TE insertions per strain in
the non-repetitive portion of the genome. Like nuclear polymorphism, we
find little population structure by shared TE insertions, though strains
do seem to disperse primarily by the number of insertions (Supplementary
Figure 11B).
Similar to D. melanogaster (Charlesworth and Langley
1989; Charlesworth et al. 1997; Petrov et
al. 2011; Kofler et al. 2012; Kofler et
al. 2015), D. innubila harbors a significant excess of low
frequency TE insertions compared to the SFS of synonymous variants
(Supplementary Figure 11A, GLM Count ~ Frequency * SNP
or TE, t-value = -16.401, p-value = 1.889e-60), with no difference in
the insertion frequency spectra between populations (GLM Count
~ Frequency * TE order * Population, t-value = -0.341,
p-value = 0.733). This implies in every population, TE insertions are on
average mildly deleterious and removed via purifying selection.
Using dnaPipeTE (Goubert et al. 2015), we find a
significantly higher density of RC & TIR elements compared to other
repeat orders (Supplementary Figure 11C, t-value = 3.555 p-value =
3.745e-04), consistent with the reference genome (Hill et
al. 2019). The density of repetitive content is also higher genome wide
in the CH and PR populations compared to HU and SR (Supplementary Figure
11C & D, t-value = 2.856, p-value = 4.291e-03). This is in keeping with
a more recent bottleneck for these species reducing effective population
size and efficacy of selection, resulting in bursts of repeat activity
with relaxed selection for removal of insertions. These changes are
primarily driven by an expansion of simple repeats in the CH population
(Supplementary Figure 11D, GLM t-value = 3.978, p-value = 7.31e-05) and
an expansion of TIR elements in the PR population (Supplementary Figure
11D, GLM t-value = 3.914, p-value = 9.52e-05). Specifically, we see
expansions of the satellite CASAT_HD (GLM t-value = 5.554, p-value =
8.832e-08) and the simple repeat sequences CAACAA, CTC and GTGT in the
CH population when compared to all other populations (GLM t-value =
9.204, p-value = 2.555e-17). In the PR population we find significantly
higher abundances of a TE families closely related toTetris_Dvir (GLM t-value = 13.641, p-value = 2.889e-32),Helitron-2N1_DVir (GLM t-value = 12.381, p-value = 2.789e-28)
and Chapaev3-1_PM (GLM t-value = 11.472, p-value = 1.662e-24)
compared to other populations. We do not find any evidence that
particular TE orders are more abundant on any one chromosome in D.
innubila (GLM t-value = 1.854, p-value = 0.633), though do find TEs are
at significantly higher insertion densities in the inverted regions of
Muller element A than at the regions of the genome (Wilcoxon Rank Sum
Test W= 19763, p-value = 0.01488). This suggests the lack of
recombination in the inverted region is allowing the accumulation of
repetitive content on Muller element A.
TE insertions are usually assumed to be at least mildly deleterious
(Charlesworth and Langley 1989; Petrov et al.2011). In D. innubila , TE density is lower in regions flanking
genes or within genes compared to non-coding regions (GLM t-value =
-6.538, p-value = 6.23e-11), consistent with the deleterious assumption.
However, the frequency of TE insertions was significantly higher in
exonic regions compared to introns and UTRs (Supplementary Figure 11A,
GLM t-value = 4.040, p-value = 5.34e-05), across all populations, which
we may have observed as these are wild caught flies and so may have more
recessive deleterious insertions segregating in the population than are
seen in inbred samples. Overall the repetitive content inDrosophila innubila appears to be mildly deleterious, with TE
insertions shared between locations by migration. Despite this there are
some major differences in the repeat content of each population,
possibly due to the stochastic effect of population bottlenecks.
This may have occurred due to a founder effect following the population
bottleneck, where a majority of CH founders by chance had a higher
proportion of particular satellites or simple repeats
(Charlesworth et al. 2003), but this is unlikely given
the gene flow between populations. Alternatively, the bottleneck could
have fixed segregating recessive variation which limits the regulation
of repetitive content in the genome, leading to its expansion. However,
if this was the case and satellite expansion is even mildly deleterious,
we would expect migratory rescue of repeat regulation machinery. A third
possibility is that satellite expansion is associated with local
evolutionary dynamics either involved in adaptation or genetic conflict
(Garrido-Ramos 2017; Lower et al. 2018).
Supplementary Figure 1: A. Population size history ofDrosophila innubila backwards in time for each population.B. Population size history on the Log10 scale ofDrosophila innubila backwards in time for each population.C. Results of Structure software (Falush et al.2003) for estimating population structure between locations for 100,000
sampled synonymous polymorphisms from all autosomes, with a K=3
(estimated optimal K value). Note that this plot summarizes all
autosomes (excluding Muller B) and the X chromosome due to very little
structure between locations for all chromosomes. D. Results of
Structure software (Falush et al. 2003) for estimating
population structure between locations for 16 polymorphisms on the
autosomes, with a K=3 (estimated optimal K value).