2.1 | Population sampling and sequencing
A total of 268 individuals of R. flavipes were collected from 29 populations spanning both native (USA) and different introduced populations in Europe (i.e., France, Germany), North America (Canada and Bahamas) and South America (Chile and Uruguay) (Figure 1; Detailed sampling is provided in Table S1). Samples were stored in 96% ethanol at 4°C until DNA extraction. Total genomic DNA was extracted from each individual using a modified Gentra Puregene extraction method (Gentra Systems, Inc. Minneapolis, MN, USA). DNA quality and concentration were assessed by agarose gel electrophoresis and Qubit® 2.0 Fluorometer (Invitrogen, USA). Suitable genomic DNA was used to construct ddRAD libraries. Libraries were prepared and sequenced at the Texas A&M AgriLife Genomics and Bioinformatics Service facility using SphI and EcoRI restriction enzymes following the protocol of Peterson et al. (2012). Each sample was identified using a unique indexed barcode. Samples were amplified through PCR with iProof™ High-Fidelity DNA Polymerase (Bio-Rad). PCR products were purified using AMPure XP beads (Beckman Coulter Inc.). Libraries were size-selected to a range of 300–500 bp using the BluePippin system (Sage Science Inc.). Libraries were sequenced on six flowcell lanes using an Illumina HiSeq 2500 (Illumina Inc., USA) to generate 150 bp paired‐end reads.
The paired-end reads were checked for quality control using FastQC v0.11.8 (Andrews 2010). Forward and reverse reads were demultiplexed from their barcodes, assigned to each sample and assembled using Stacks v.2.41 (Rochette et al. 2019). Reads were first aligned to the R. flavipes reference genome (Zhou et al. unpublished data) using the Burrows-Wheeler Aligner (Li and Durbin 2009). Aligned reads were then run through the reference-based pipeline of Stacks, which built and genotyped the paired-end data, as well as called SNPs using the population-wide data per locus. Only SNPs present in at least 70% of individuals in half of the populations were kept for downstream analyses. Furthermore, SNPs with mean coverage lower than 5x and higher than 200x were removed using VCFtools v.0.1.15 (Danecek et al. 2011), to prevent unlikely SNPs and highly repetitive regions. Low frequency alleles (< 0.05) and highly heterozygous loci (> 0.7) were sorted out, as they likely represent sequencing errors and paralogs (Benestan et al. 2016). The dataset was further converted into input files usable by different downstream software programs through PGDSpider v.2.1.1.5 (Lischer and Excoffier 2011).