2.3. SNPs calling and filtering
Raw sequence data from GBS and RADSeq were independently processed as
well as merged (COMBINED dataset). Thereafter, the three generated
datasets (GBS, RADSeq and COMBINED) were processed through the same
pipeline.
All barcoded raw reads were mapped to the P. lilfordi genome
(rPodLil1.1, available athttps://denovo.cnag.cat/podarcis(Gomez-Garrido et al., 2023) using the Burrows-Wheeler Aligner (BWA)
software (v2.1) to generate individual Binary Alignment Map (BAM) files.
The genome corresponds to a single female individual sampled in the Aire
population (the population was included in our dataset). Variant calling
per sample was performed with Genome Analysis Toolkit HaplotypeCaller
from GATK v. 3.7 (Van der Auwera GA & O’Connor BD., 2020) to produce
gVCF files (minimum-mapping-quality 20). gVCF files were finally
genotyped for variant calling (i.e., SNPs).
Prior to data filtering, SNP and sample statistics were explored using
VCFTools v. 0.1.16 (Danecek et al., 2011; De la Cruz and Raska, 2014).
SNPs were then filtered using the following settings: –max-alleles 2,
–remove-indels, –min-meanDP 10, –max-meanDP 50, –minQ 30,
–maf 0.05, –max-missing 0.75 (individual datasets) or 0.90
(combined).
To retrieve common loci between GBS and RADSeq datasets, the max-missing
parameter was set to 0.9, thus retaining loci shared by at least 90% of
the specimens and effectively removing all loci unique to either GBS or
RADSeq.