Outlier detection and association of allele frequency with
environmental variables
To identify candidate loci potentially influenced by selection withinE. coccineum PG_dataset, two complementary approaches were used.
First, we used a Principal Component Analysis (PCA) to generate a
“model-free” null distribution of individual genetic distances to
detect outlier loci. This analysis was conducted using the R packages
PCAdapt v.4.0.3 (Luu et al., 2017) and qvalue v.2.16.0 (Storey et al.,
2004) with default parameters and applying a search with K=10. The
detection of outliers was confirmed by plotting the histograms ofp -values and the Mahalanobis (D2) test
statistic, with a False Discovery Rate (FDR) set at 5%. Second, we
implemented a Bayesian test to detect outlier loci using the BAYESCAN
2.1 software (Foll et al., 2008). All simulations were performed using
the default parameters with 20 pilot runs of 5,000 iterations followed
by 50,000 sampling iterations and using a FDR of 5%. Outlier loci
detected by both PCAdapt and BAYESCAN were selected as candidate
adaptive loci.
To search for candidate adaptive loci with allelic frequency potentially
influenced by the environmental variation characterizing our study area
(see Supplementary table 1), a GradientForest (GF) analysis was
implemented in the R package gradientForest (Ellis et al., 2012). GF
provides a ranked list of the relative predictive power
(R2) of all environmental variables allowing the
identification of those that best explaining the observed genetic
variation. Allele frequency of candidate adaptive loci (i.e., detected
in both PCAdapt and BAYESCAN) was used as response variable. GF was
fitted using 2,000 regression trees per SNP and a variable correlation
of 0.5 (Fitzpatrick and Keller, 2014)