Population genetic summary statistics and structure
Using the generated total VCF file with SNPeff annotations, we created a
second VCF containing only synonymous polymorphism using BCFtools
(Narasimhan et al. 2016). We calculated pairwise
diversity per base, Watterson’s theta, Tajima’s D (Tajima 1989)
and FST (Weir and Cockerham 1984) (versus all
other populations) across the genome for each gene in each population
using VCFtools (Danecek et al. 2011) and the VCF
containing all variants. Using ANGSD to parse the synonymous
polymorphism VCF (Korneliussen et al. 2014), we
generated synonymous unfolded site frequency spectra for the D.
innubila autosomes for each population, using the D. falleni andD. phalerata genomes as outgroups to the D. innubilagenome (Hill et al. 2019).
We used the population silent SFS with previously estimated mutation
rates of Drosophila (Schrider et al. 2013), as inputs in
StairwayPlot (Liu and Fu 2015), to estimate the effective
population size backwards in time for each location.
We also estimated the extent of population structure across samples
using Structure (Falush et al. 2003), repeating the
population assignment for each chromosome separately using only silent
polymorphism, for between one and ten populations (k = 1-10, 100000
iterations burn-in, 400000 iterations sampling). Following
(Frichot et al. 2014), we manually assessed which number
of subpopulations best fits the data for each D. innubilachromosome and DiNV to minimize entropy.