Inversions
For each sample, we used Delly (Rausch et al. 2012) to
generate a multiple sample VCF file identifying regions in the genome
which are potentially duplicated, deleted or inverted compared to the
reference genome. Then we filtered and removed inversions found in fewer
than 1% of individuals and with a GATK VCF quality score lower than
200. We also called inversions using Pindel (Ye et al.2009) in these same samples and again removed low quality inversion
calls. We next manually filtered samples and merged inversions with
breakpoints within 1000bp at both ends and significantly overlapping in
the presence/absence of these inversions across strains (using a
χ2 test, p -value < 0.05). We also
filtered and removed large inversions which were only found with one of
the two tools. Using the remaining filtered and merged inversions we
estimated the frequency of each inversion within the total population.