GBS data sets
To test for possible effects of parameters assembly and missing data on patterns of population genetic structure, distinct sets of parameters (-p: minimum number of populations where a locus must be present for it to be included; and -r: percentage of individuals in a population that must possess a particular locus for it to be included) were examined before choosing optimal filtering options for further analysis. Eight combinations of parameters -p and -r were tested (-r= 0.6 and 0.7; -p= 6, 7, 8 and 9). For each combination the % of missing data, % of heterozygosity and number of SNPs were estimated (data not shown). The minimum minor allele frequency (min-maf) was set to 0.02 and maximum observed heterozygosity (max-obs-het) to 0.5. The “Population Genetic data set” (i.e., PG_dataset), was build using -p 9 and –r 0.7.
Furthermore, another data set was built to estimate the divergence time among the E. coccineum genetic groups, using the Bayesian multispecies coalescent model (Stange et al., 2018). In order to reduce calculation time, we selected a subset of sampling locations from the North (Nah and Cu), the Center (ChlN and ChlS) and the South (Coy and TP). Then, a subset of 12 E. coccineum individuals (i.e., two per location) were selected to maximize the number of SNPs available and minimize the amount of missing data. These 12 E. coccineumindividuals were processed jointly with the samples of L. hirsuta , included as outgroup. A new SNP calling was conducted following the same parameters than those used for the PG_dataset in Stacks v.2.2 pipeline (Catchen et al., 2013), eventually generating the “divergence time data set” (i.e., DT_dataset). In this case, optimal values for -p and -r parameters were chosen in order to decrease the percentage of missing data: only the loci present in all localities, including the outgroup (-p 7), and present in a least 80 % of the individuals (-r 0.8) were kept for further analyses. Putative outliers loci were removed from the DT_dataset. For each data set, one SNP was randomly selected per locus to aproximate unlinked loci variation.