Inversions
For each sample, we used Delly (Rausch et al. 2012) to generate a multiple sample VCF file identifying regions in the genome which are potentially duplicated, deleted or inverted compared to the reference genome. Then we filtered and removed inversions found in fewer than 1% of individuals and with a GATK VCF quality score lower than 200. We also called inversions using Pindel (Ye et al.2009) in these same samples and again removed low quality inversion calls. We next manually filtered samples and merged inversions with breakpoints within 1000bp at both ends and significantly overlapping in the presence/absence of these inversions across strains (using a χ2 test, p -value < 0.05). We also filtered and removed large inversions which were only found with one of the two tools. Using the remaining filtered and merged inversions we estimated the frequency of each inversion within the total population.