GBS data sets
To test for possible effects of parameters assembly and missing data on
patterns of population genetic structure, distinct sets of parameters
(-p: minimum number of populations where a locus must be present for it
to be included; and -r: percentage of individuals in a population that
must possess a particular locus for it to be included) were examined
before choosing optimal filtering options for further analysis. Eight
combinations of parameters -p and -r were tested (-r= 0.6 and 0.7; -p=
6, 7, 8 and 9). For each combination the % of missing data, % of
heterozygosity and number of SNPs were estimated (data not shown). The
minimum minor allele frequency (min-maf) was set to 0.02 and maximum
observed heterozygosity (max-obs-het) to 0.5. The “Population Genetic
data set” (i.e., PG_dataset), was build using -p 9 and –r 0.7.
Furthermore, another data set was built to estimate the divergence time
among the E. coccineum genetic groups, using the Bayesian
multispecies coalescent model (Stange et al., 2018). In order to reduce
calculation time, we selected a subset of sampling locations from the
North (Nah and Cu), the Center (ChlN and ChlS) and the South (Coy and
TP). Then, a subset of 12 E. coccineum individuals (i.e., two per
location) were selected to maximize the number of SNPs available and
minimize the amount of missing data. These 12 E. coccineumindividuals were processed jointly with the samples of L.
hirsuta , included as outgroup. A new SNP calling was conducted
following the same parameters than those used for the PG_dataset in
Stacks v.2.2 pipeline (Catchen et al., 2013), eventually generating the
“divergence time data set” (i.e., DT_dataset). In this case, optimal
values for -p and -r parameters were chosen in order to decrease the
percentage of missing data: only the loci present in all localities,
including the outgroup (-p 7), and present in a least 80 % of the
individuals (-r 0.8) were kept for further analyses. Putative outliers
loci were removed from the DT_dataset. For each data set, one SNP was
randomly selected per locus to aproximate unlinked loci variation.