Outlier detection and association of allele frequency with environmental variables
To identify candidate loci potentially influenced by selection withinE. coccineum PG_dataset, two complementary approaches were used. First, we used a Principal Component Analysis (PCA) to generate a “model-free” null distribution of individual genetic distances to detect outlier loci. This analysis was conducted using the R packages PCAdapt v.4.0.3 (Luu et al., 2017) and qvalue v.2.16.0 (Storey et al., 2004) with default parameters and applying a search with K=10. The detection of outliers was confirmed by plotting the histograms ofp -values and the Mahalanobis (D2) test statistic, with a False Discovery Rate (FDR) set at 5%. Second, we implemented a Bayesian test to detect outlier loci using the BAYESCAN 2.1 software (Foll et al., 2008). All simulations were performed using the default parameters with 20 pilot runs of 5,000 iterations followed by 50,000 sampling iterations and using a FDR of 5%. Outlier loci detected by both PCAdapt and BAYESCAN were selected as candidate adaptive loci.
To search for candidate adaptive loci with allelic frequency potentially influenced by the environmental variation characterizing our study area (see Supplementary table 1), a GradientForest (GF) analysis was implemented in the R package gradientForest (Ellis et al., 2012). GF provides a ranked list of the relative predictive power (R2) of all environmental variables allowing the identification of those that best explaining the observed genetic variation. Allele frequency of candidate adaptive loci (i.e., detected in both PCAdapt and BAYESCAN) was used as response variable. GF was fitted using 2,000 regression trees per SNP and a variable correlation of 0.5 (Fitzpatrick and Keller, 2014)