Nucleotide polymorphisms across the population samples
For the 318 sequenced samples with reasonable coverage, we called SNPs using GATK (McKenna et al. 2010; DePristoet al. 2011) which generated a multiple strain VCF file. We then used BCFtools (Narasimhan et al. 2016) to remove sites with a GATK quality score (a composite PHRED score for multiple samples per site) lower than 950 and sites absent (e.g. sites of low quality, or with 0 coverage) from over 5% of individuals. This filtering left us with 4,522,699 SNPs and small indels across the 168Mbp genome ofD. innubila . We then removed SNPs found as a singleton in a single population (as possible errors), leaving us with 3,240,198 SNPs. We used the annotation of D. innubila and SNPeff (Cingolani et al. 2012) to identify SNPs as synonymous, non-synonymous, non-coding or another annotation. Simultaneous to theD. innubila population samples, we also mapped genomic information from outgroup species D. falleni (SRA: SRR8651761) and D. phalerata (SRA: SRR8651760) to the D. innubilagenome and called divergence using the GATK variation calling pipeline to identify derived polymorphisms and fixed differences in D. innubila .