Population structure
Population structure analyses were performed separately for each species. To obtain a set of “neutral markers” for each species for these analyses, we first removed from each dataset the loci identified as candidates for adaptation by the pRDA. Next, we identified loci that deviated from Hardy-Weinberg Equilibrium (HWE) proportions for all samples combined using the function –hardy from VCFtools, and we removed loci with p < 0.05 from the dataset.
We first estimated isolation-by-distance (IBD) using the mantelfunction from the vegan R package (Oksanen et al. 2011) for both the neutral and adaptive datasets. We used the functiondist.genpop from the adegenet R package (Jombart & Ahmed 2011) to calculate Nei’s genetic distances (Nei 1972, 1978) between all pairs of populations, and plotted those genetic distances against pairwise Euclidean geographic distances (average geographic location of each population). We performed this for both NIDGS and SIDGS datasets and for neutral and adaptive loci separately.
We performed PCA to obtain a visual inspection of population structure without underlying models of evolution. We used the snpgdsPCAfunction from the SNPRelate R package (Zheng et al. 2012) to perform the PCA for the complete dataset (all loci), the neutral dataset, and the adaptive dataset of each species. We used theggplot function from the ggplot 2 R package to construct the plots (Kassambara 2018). For the neutral data, we also evaluated population structure for each species using an approach with an underlying Bayesian model to estimate the number of populations (K ), using the program STRUCTURE (Pritchard et al.2000) and the ParallelStructure R package (Besnier & Glover 2013). We applied an admixture model run for 1x106generations, and five replicates per K , with K = 1 toK = 13 in NIDGS and to K = 5 in SIDGS, which correspond to the total number of populations sampled for each species. We then determined the optimal number of populations (K ) according to the Evanno method and the rate of change in the likelihood values (Evannoet al. 2005).
We estimated interspecific population differentiation using the IDGS total dataset, and intraspecific population differentiation for NIDGS and SIDGS ‘neutral’ and ‘adaptive’ datasets, using the functionsgenet.dist and boot.ppfst from the R packagehierfstat to calculate genetic divergence (F ST) with 999 bootstraps between all geographic sites (Goudet 2005). We then built correlation matrices of pairwiseF ST between all populations using the functioncorrplot from the corrplot R package (Wei et al.2017). Using the same datasets, we determined genetic diversity by estimating observed and expected heterozygosity (H O and H E, respectively) per population. To estimate H O andH E, we used the function basic.stats from the hierfstat R package (Goudet 2005), averaging locus heterozygosity per population. We then performed a paired Wilcoxon test to identify significant differences between H Oand H E for both neutral and adaptive loci among populations of each species, between HE of both neutral and adaptive loci among populations of each species, and HE between neutral and adaptive datasets for each species, using the function t.test from the stats R package.