Microevolutionary analyses
Variant calling. Assessments of population genetic diversity
and population structure are based on ORF targets without
intronic/intergenic flanking regions, and UCEs with flanking regions.
Variant calling was performed by alignment of the raw reads against
custom-built consensus sequences for ORFs and UCEs with their respective
flanking regions using Bowtie2 on a sample-by-sample basis,
after which .sam files were converted to .bam using SAMTools,
modified with Picard v. 2.21.4 (available at
http://broadinstitute.github.io/picard), and merged across individuals
with SAMTools. These .bam files, one for ORFs, another for
UCEs, were used to call and annotate single nucleotide polymorphisms
(SNPs) with GATK v. 4.1.9.0 (McKenna et al., 2010) using
HaplotypeCaller. Individual GVCF files were merged per marker
type and subjected to joint genotyping to obtain a .vcf file with
information on all sites, both variant and invariant. From the .vcf file
for ORFs we extracted information on exonic sites using an
EXONERATE-generated .gff annotation file. The resulting .vcf files were
filtered with functions of VCFtools v. 0.1.16 (Danecek et al.,
2011) and BCFtools v. 1.12 (Danecek et al., 2021) depending on
downstream analyses as described below.
Nucleotide diversity and population differentiation. For
subsequent analyses of nucleotide diversity and population
differentiation, the .vcf files for ORFs and UCEs were individually
filtered to retain sites with a mean genotype depth of 8 that have been
successfully genotyped in 80% of the individuals. Because invariant
sites do not have quality scores when using condensed non-variant
blocks, we created individual .vcf files for variant and invariant
sites. Invariant sites were identified by setting the minor allele
frequency to zero (–max-maf), whereas variant sites have a minor
allele count ≥1 (–mac). Only variant sites with a minimum quality
score of 30 (-minQ) were retained. Subsequently, we indexed both .vcf
files with tabix of SAMtools and combined them with
BCFtools to estimate nucleotide diversity (π ) within
populations per ORF and UCE, the average, absolute nucleotide divergence
between population pairs (DXY ) and population
differentiation (FST ) using Pixy v.
1.0.4 (Korunes & Samuk, 2021). We also converted the filtered .vcf file
for ORFs into a multifasta file with random (unphased) assignment of
variants to the two alleles. Each multifasta file was aligned to its ORF
target, after which reading frames were manually verified in
SeaView. Gap-only sites in protein sequences were removed. The
verified multifasta files were combined to calculate synonymous and
non-synonymous nucleotide diversity (πS and
πN, respectively) with dNdSpiNpiS v. 1.0
(available at http://kimura.univ-montp2.fr/PopPhyl).
Genetic structure. Analyses of genetic structure were
undertaken for variant sites of ORFs only, which were obtained from the
total .vcf file by filtering for a minimum allele count of 3, a minor
allele frequency ≥0.05, a minimum quality score of 30 and a mean
genotype depth of 8 for at least 80% of the individuals. Furthermore,
indels and variants that were not bi-allelic were removed, as were sites
with a Hardy-Weinberg p -value<0.001. Finally, we
removed individuals for which >50% of the sites displayed
missing data. The resulting .vcf file was converted to a .bed file for
principal component analysis (PCA) using PLINK v. 1.90b6.18 (Purcell et
al., 2007), and to examine population structure with
fastSTRUCTURE (Raj et al., 2014) using K=1-6 with 30
independent runs per K. The underlying genetic structure and the
appropriate number of clusters were examined with the ΔK method (Evanno
et al., 2005) and others that are more robust when sampling is uneven
(Puechmaille, 2016), as implemented in StructureSelector (Y.-L.
Li & Liu, 2018).