Tests for population structure and ancestry
We used Discriminant Analysis of Principal Components (DAPC) and K-means clustering to determine whether there was underlying population structure of P. subaeruginosa in Australia and the northern hemisphere (Jombart, Devillard, & Balloux, 2010). The packages vcfR (Knaus & Grünwald, 2017), adegenet (Jombart, 2008), and ggplot2 implemented in R (R_Core_Team, 2014) were used to import an LD-corrected VCF file, cluster populations by k-means clustering and DAPC, and plot results, respectively.
We used the relatedness command in vcftools v1.17 (Danecek et al., 2011), which estimates relationships based on pairwise similarity of genetic markers between individuals due to shared genetic ancestry, as defined as the AJK statistic by Yang et al. (2010). We plotted relatedness values in a pairwise heat map using ggplot2 in R (R_Core_Team, 2014).