Microevolutionary analyses
For each ORF and UCE we genotyped 48 diploid, dioecious individuals belonging to six populations from the Malawi Basin. Filtering for nucleotide diversity and population differentiation resulted in the retention of 1,097 ORFs and high-quality genotypes for ~1,153,827 out of 1,675,638 sites (68.9%), of which 17,988 were polymorphic (1.1%; Ortiz-Sepulveda et al., 2022). Using identical filtering criteria, we retained 309 UCEs with high-quality genotypes for 111,492 out of 254,694 sites (43.8%), of which 3,148 were polymorphic (1.2%). Estimates of π average 0.00167±0.00316 and 0.00116±0.00206 for ORFs and UCEs, respectively, and they are very similar across the six studied populations (Fig. 6A, S5A), indicating an average of ~1,915 and ~130 nucleotide differences in pairwise haplotype comparisons, respectively. The average πS per population varies between 0.00220 and 0.00329, whereas πN varies between 0.00057 and 0.00092, resulting in an average πNS of 0.259 (Fig. S6). Including intronic/intergenic flanking regions of ORFs would further increase the number of variant sites that can be analysed. PairwiseDXY -values average 0.00175±0.00008 for ORFs and 0.00127±0.00015 for UCEs, which is only 5% to 9% higher than the mean π, respectively (Fig. 6A, S5A), indicating limited overall net nucleotide divergence among populations.FST -values average 0.060±0.019 for ORFs and 0.054±0.015 for UCEs, indicating moderate genetic differentiation. Substantial variation exists in FST values among ORFs (Fig. 6B) and UCEs (Fig. S5B). Per pairwise population comparison between 60 and 230 ORFs and between 16 and 38 UCEs displayFST values >0.15, of which 42.0% and 59.3%, respectively, display elevatedDXY -values too (i.e.DXY >0.002).
Filtering ORF data to examine genetic structure resulted in the removal of two individuals (dna0469 and dna0416) and a final dataset of 2,161 SNPs (Ortiz-Sepulveda et al., 2022). PCA on this dataset indicated that PC1, 2 and 3 represent 11.7, 10.9 and 8.7% of all variation in the dataset, respectively. The 95% convex hulls of populations overlap substantially within the northern and southern regions, but not between them (Fig. 7). Both regions are mainly separated along PC3. The Likoma Island population falls closer to populations of the northern region in PC1 vs. 2, but closer to those of the south in PC1 vs. 3. The population of the Shire River overlaps with one population from the south, but shows substantial differentiation from the other southern population. These results are highly congruent with those obtained with fastSTRUCTURE on the same dataset, which suggest K=4 to be the best scenario with the ΔK method and most of the estimators of Puechmaille (2016). Some of these latter estimators suggested 5 clusters, but with specimen assignments that are almost identical to the K=4 solution (Fig. 7). Two of these four clusters correspond to sampling locations, i.e. Likoma Island and Shire River, whereas the others coincide with a north-south separation in which one population from the south (MLW8-014) displays mixed assignments, including signatures from the northern and Shire River clusters. Interestingly, the Shire River cluster, although being geographically in the far south, clusters with the north in the K=3 scenario.