Nucleotide polymorphisms across the population samples
For the 318 sequenced samples with reasonable coverage, we called SNPs
using GATK (McKenna et al. 2010; DePristoet al. 2011) which generated a multiple strain VCF file. We then
used BCFtools (Narasimhan et al. 2016) to remove sites
with a GATK quality score (a composite PHRED score for multiple samples
per site) lower than 950 and sites absent (e.g. sites of low quality, or
with 0 coverage) from over 5% of individuals. This filtering left us
with 4,522,699 SNPs and small indels across the 168Mbp genome ofD. innubila . We then removed SNPs found as a singleton in a
single population (as possible errors), leaving us with 3,240,198 SNPs.
We used the annotation of D. innubila and SNPeff
(Cingolani et al. 2012) to identify SNPs as synonymous,
non-synonymous, non-coding or another annotation. Simultaneous to theD. innubila population samples, we also mapped genomic
information from outgroup species D. falleni (SRA: SRR8651761)
and D. phalerata (SRA: SRR8651760) to the D. innubilagenome and called divergence using the GATK variation calling pipeline
to identify derived polymorphisms and fixed differences in D.
innubila .