2.3 Population mapping and statistics
We assessed quality on the resequencing data using FastQC v0.11.5 (Andrews, 2017) before and after filtering, and only retained reads ≥50 bp with a quality score >30 in both read start and end. All sequence reads were mapped against the Galerucella calmariensisreference genome (Yang, Slotte, Dainat, & Hambäck, 2021) using NextGenMap version 0.4.12 (Sedlazeck, Rescheneder, & von Haeseler, 2013). The reference genome had an assembly size of 588 Mbp, containing 39,255 scaffolds and 40,031 predicted proteins with 91.3% and 85.1% complete orthologs in the genome and proteome, respectively, compared with the endopterygota_odb10 database (Simão, Waterhouse, Ioannidis, Kriventseva, & Zdobnov, 2015) (For further info on the reference genome assembly see Yang et al., 2021). Mapping rates were similar between samples (85% to 95%). We filtered the resulting bam files with Samtools v1.3.1 (Li et al., 2009) to retain alignments with mapping quality>20 (-q 20).
We next called SNPs across all samples using FREEBAYES v0.9.21 (Garrison & Marth, 2012). For SNP filtering, we only kept bi-allelic sites with a minimum read depth of 5X, a quality score >30 and a maximum proportion of missing data of 20%. To ensure there is not population genetic structure across populations within each species, we conducted a PCA analysis. For this purpose, we first conducted LD-based pruning (–indep-pairwise 50 10 0.2), followed by a principal component analysis (PCA) across all the samples using Plink v1.9 (Purcell et al., 2007) (Supporting information Figure S1). Genetic diversity (nucleotide polymorphism, π) was estimated for each species using pixy (Korunes & Samuk, 2021).