Assembly filtering and sequencing data pre-processing

Individual contigs in the reference genome (GenBank assembly accession: GCA_900499035.1) of S. pilchardus (Louro et al., 2018) matching mitochondrial DNA (unique assignment to mitochondrial DNA without nuclear genome flanking regions) were identified via BLAST (version 2.6.0+) (Camacho et al., 2008) using the mitochondrial genome (mtDNA) assembled by Machado et al. (2018) as a query. Matching contigs were removed from the assembly file and replaced by the mtDNA of Machado et al. (2018) to enable the extraction of individual mtDNA sequences from all individuals after mapping of resequencing data.
Regions of low complexity in the reference genome of S. pilchardus (Louro et al., 2018) were detected with GenMap (version v1.3.0) (Pockrandt et al., 2020) using a k-mer of 100bp. We calculated the normalized depth per scaffold by using the sequencing depth of scaffold 1 as a reference to identified potentially misassemblies (e.g. unmerged haploid scaffolds or collapsed repeats regions). Regions of mappability below 1 (meaning that more than one 100bp kmer matched the region, indicating duplications) or identified as repeats in all the other scaffolds, and sites with data missing in more than 25% of the individuals were excluded from all subsequent analyses.
Raw Illumina reads for all 108 samples were first processed with Trimmomatic (version 0.36) (Bolger et al., 2014) for removal of adapter sequences and trimming bases with quality <20 and discarded reads with length <80. Clean reads were mapped to the genome assembly using bwa-mem version: 0.7.17-r1188 (Li, 2013) and samtools version: 1.7 (Li et al. 2009) was used to retain reads with mapping quality >25. PCR duplicates were removed with Picard MarkDuplicates (version 1.95; http://picard.sourceforge.net) and only reads were both pairs were retained were considered for the local realignment around indels with GATK version 3.6-0-g89b7209 (DePristo et al., 2011) and further analyses. The mapping and base quality options -minQ 20 -minMapQ 30 were used in all subsequent analyses with ANGSD (Korneliussen et al., 2014).

Population structure

Beagle files with the nuclear genome positions of single nucleotide polymorphisms (SNPs) were produced by ANGSD (Korneliussen et al., 2014) using the following options: -GL 1 -doGlf 2 -minMaf 0.05 -C 50 -baq 2 -remove_bads 1 -uniqueOnly 1 -SNP_pval 1e-6. Linkage disequilibrium (LD) was estimated as r2 values for all SNP pairs minimum 500 kbp apart with ngsLD v1.1.1 (Fox et al., 2019) and a LD decay curve was plotted using the script provided by (Fox et al., 2019) using 0.05% of all estimated r2 values. This indicated that a distance threshold of 2,000 bp was adequate for linkage pruning. A total of 560,735 SNPs were obtained using all samples (n=108, Table S1), and 319,236 SNPs were found to be in putatively neutral regions of the genome (see below). Admixture proportions were estimated by running NGSadmix version 32 (Skotte eKt al., 2013) for K equal 2 and 3 with 300 seed values, ensuring convergence (convergence was not reached for K = 4 and above). A principal component analysis (PCA) using the same SNP set was obtained with PCAngsd version 0.1 (Meisner & Albrechtsen, 2018).
The mitochondrial genome (mtDNA) for each individual was obtained as a consensus sequence of the reads mapped to the mitochondrial DNA sequence included in the reference genome. The individual sequences were generated in ANGSD (Korneliussen et al., 2014) using the option -doFasta 2 for sites with a minimum sequencing depth of 10X (-setMinDepth 10). An haplotype network was designed using mitochondrial SNPs with minor allele frequency >30% (total of 26 SNPs) in POPART (Leigh & Bryant, 2015) with the Median Joining Network algorithm (Bandelt at al., 1999).