2.1 | Population sampling and sequencing
A total of 268 individuals of R. flavipes were collected from 29
populations spanning both native (USA) and different introduced
populations in Europe (i.e., France, Germany), North America
(Canada and Bahamas) and South America (Chile and Uruguay) (Figure 1;
Detailed sampling is provided in Table S1). Samples were stored in 96%
ethanol at 4°C until DNA extraction. Total genomic DNA was extracted
from each individual using a modified Gentra Puregene extraction method
(Gentra Systems, Inc. Minneapolis, MN, USA). DNA quality and
concentration were assessed by agarose gel electrophoresis and Qubit®
2.0 Fluorometer (Invitrogen, USA). Suitable genomic DNA was used to
construct ddRAD libraries. Libraries were prepared and sequenced at the
Texas A&M AgriLife Genomics and Bioinformatics Service facility using
SphI and EcoRI restriction enzymes following the protocol of
Peterson et al. (2012). Each sample was
identified using a unique indexed barcode. Samples were amplified
through PCR with iProof™ High-Fidelity DNA Polymerase (Bio-Rad). PCR
products were purified using AMPure XP beads (Beckman Coulter Inc.).
Libraries were size-selected to a range of 300–500 bp using the
BluePippin system (Sage Science Inc.). Libraries were sequenced on six
flowcell lanes using an Illumina HiSeq 2500 (Illumina Inc., USA) to
generate 150 bp paired‐end reads.
The paired-end reads were checked for quality control using FastQC
v0.11.8 (Andrews 2010). Forward and
reverse reads were demultiplexed from their barcodes, assigned to each
sample and assembled using Stacks v.2.41
(Rochette et al. 2019). Reads were first
aligned to the R. flavipes reference genome (Zhou et al.
unpublished data) using the Burrows-Wheeler Aligner
(Li and Durbin 2009). Aligned reads were
then run through the reference-based pipeline of Stacks, which built and
genotyped the paired-end data, as well as called SNPs using the
population-wide data per locus. Only SNPs present in at least 70% of
individuals in half of the populations were kept for downstream
analyses. Furthermore, SNPs with mean coverage lower than 5x and higher
than 200x were removed using VCFtools v.0.1.15
(Danecek et al. 2011), to prevent
unlikely SNPs and highly repetitive regions. Low frequency alleles
(< 0.05) and highly heterozygous loci (> 0.7)
were sorted out, as they likely represent sequencing errors and paralogs
(Benestan et al. 2016). The dataset was
further converted into input files usable by different downstream
software programs through PGDSpider v.2.1.1.5
(Lischer and Excoffier 2011).