Statistical Analyses
After alignment, we discarded the monomorphic nucleotide positions, and considered only those polymorphic. Thus, we were left with 30 positions for EF1 and 33 positions for NaKA, and the following analysis was done for each marker separately: For each of the 14 populations, we calculated the distribution of the four different nucleotides (A, C, G and T) in each of the nucleotide positions. Note that ifx 1, x 2,x 3 and x 4 are the proportions of A, C, G and T in a position, then is a point in the four-dimensional space, whose sum of coordinates is 1. The corresponding point x * = lies on the surface of the four-dimensional unit sphere. Next we compared, for each position, the distribution of the four different nucleotides between the 14 different populations, by using a distance or a similarity metric (see below). Thus, we get for each position, 91 pairwise distances (or similarities). These distances (or similarities) were averaged over all relevant positions of the marker, to obtain the final pairwise distances (or similarities) for the marker. The results were arranged in a 14×14 symmetric distance (or similarity) matrix. We then we added the two distance (or similarity) matrices (one for each marker) to obtain the comprehensive distance (or similarity) between the populations. This final matrix served for constructing population dendrograms or for performing a Principal Coordinates (PCoA) analysis.
We used three different distance measures, the squared Euclidean distance, a modified squared chord distance and the Manhattan (or city block) distance, and one similarity measure, a modified Morisita’s similarity coefficient. If x 1,x 2, x 3 andx 4 are the proportions of A, C, G and T in population 1, and y 1, y 2,y 3 and y 4 are these proportions in population 2, then: the squared Euclidean distance = ; the modified squared chord distance = (which is actually the squared length of the chord connecting x * and y * on the unit sphere); the Manhattan distance = ; and the modified Morisita’s similarity coefficient = .
We considered three different amalgamation procedures – UPGMA (unweighted pair group method with arithmetic mean), minimum variance (Ward’s method) and furthest neighbor (complete-linkage clustering), as well as PCoA analysis, using the MVSP software, Kovach Computation Services 2013. We thus can construct ten different unconstrained trees (i.e., all different combinations, except that minimum variance is only applicable for the squared Euclidean or the squared chord distances).
For each population we calculated, separately for each marker, the mean number of different alleles per position. We then averaged over the two markers to obtain the overall mean number of different alleles per position in this population. In addition, we calculated for each population the mean expected heterozygosity of a marker, defined as, where ,,and are the proportions of A, C, G and T in position k(k = 1, 2, … ,N , where N is the number of positions in the marker). We then averaged the measures of the two markers, to obtain the expected heterozygosity of the relevant population. Similarly, for each population, we calculated the percentage of polymorphic positions in each marker, and then averaged the percentages of the two markers, to obtain the polymorphism measure of the relevant population.
We call an allele which is present in a population X (with a frequency of at least 1%), anexclusive to that population, if it is present in that population but not in other population or populations to which we compare population X.
Statistical tests were carried out using IBM SPSS Statistics 26. Allp -values are given for a two-tailed alternative.