Population genetic summary statistics and structure
Using the generated total VCF file with SNPeff annotations, we created a second VCF containing only synonymous polymorphism using BCFtools (Narasimhan et al. 2016). We calculated pairwise diversity per base, Watterson’s theta, Tajima’s D (Tajima 1989) and FST (Weir and Cockerham 1984) (versus all other populations) across the genome for each gene in each population using VCFtools (Danecek et al. 2011) and the VCF containing all variants. Using ANGSD to parse the synonymous polymorphism VCF (Korneliussen et al. 2014), we generated synonymous unfolded site frequency spectra for the D. innubila autosomes for each population, using the D. falleni andD. phalerata genomes as outgroups to the D. innubilagenome (Hill et al. 2019).
We used the population silent SFS with previously estimated mutation rates of Drosophila (Schrider et al. 2013), as inputs in StairwayPlot (Liu and Fu 2015), to estimate the effective population size backwards in time for each location.
We also estimated the extent of population structure across samples using Structure (Falush et al. 2003), repeating the population assignment for each chromosome separately using only silent polymorphism, for between one and ten populations (k = 1-10, 100000 iterations burn-in, 400000 iterations sampling). Following (Frichot et al. 2014), we manually assessed which number of subpopulations best fits the data for each D. innubilachromosome and DiNV to minimize entropy.