Analysis of allele-frequency clines
We conducted three statistical tests for the occurrence of
allele-frequency clines, which are explained in detail in the following
paragraphs. First, we estimated the significance of regression of
average allele frequency across loci against latitude. Second, we tested
whether we could attribute the regressions (clines) to selection.
Finally, we tested whether selection resulted in increased allele
frequencies across the entire latitudinal range. In these tests, we
excluded HiP, because here the male-deleterious alleles have earlier
been shown to be under strong negative selection (van Hooft et al.,
2019). In all tests, we pooled the frequencies of
male-deleterious-trait-associated alleles at each locus (Table S1). We
refer to the pooled alleles as DE and SAE alleles. Allele-frequency
clines were based on plotting the DE and SAE allele frequencies as a
function of latitude. The results of the three tests were not
meaningfully influenced by sample-size weights, potential errors in
allele frequency correction (see next paragraph), or by allele size
standardization (Text S2).
In the first test (regression of average allele frequency against
latitude), we corrected the average allele frequencies across
microsatellites to account for the fact that populations were analysed
with different microsatellite subsets using KNP frequencies as a
standard for this correction. Specifically, we first calculated the
correction factor for the i th population in
terms of the relative frequencies of the alleles found in populationi compared with the full microsatellite set of 17 alleles found
in both the northern (NK) and southern (SK) Kruger populations. If we
use the notation \({\overset{\overline{}}{f}}^{\text{NK}}\) and\({\overset{\overline{}}{f}}^{\text{SK}}\) to represent the average
frequencies of the 17 alleles found in northern and southern KNP and\({{\overset{\overline{}}{f}}_{i}}^{\text{NK}}\) and\({{\overset{\overline{}}{f}}_{i}}^{\text{SK}}\) to represent the
averages in northern and southern KNP over only those alleles found in
population i (some subset of the 17 alleles found in KNP), then
our corrected average frequency\({{\overset{\overline{}}{f}}_{i}}^{\text{cor}}\) for populationi , in terms of the observed average frequency\({{\overset{\overline{}}{f}}_{i}}^{\text{obs}}\) of the subset of
alleles in population i is given by:
\({{\overset{\overline{}}{f}}_{i}}^{\text{cor}}={{\overset{\overline{}}{f}}_{i}}^{\text{obs}}({\overset{\overline{}}{f}}^{\text{NK}}+{\overset{\overline{}}{f}}^{\text{SK}})/({{\overset{\overline{}}{f}}_{i}}^{\text{NK}}+{{\overset{\overline{}}{f}}_{i}}^{\text{SK}})\)(1)
This correction assumes a similar frequency ratio in all populations.
The ratios were estimated for all loci combined, and for DE and SAE loci
separately. The frequency correction had only a minor effect as the
ratios varied by no more than a factor of 1.1 (range: 0.92-1.09), except
for the Caprivi Strip population for which data from only two loci were
available.
The large variation in number of genotyped individuals and number of
genotyped loci between populations resulted in significant
heteroskedasticity in our regression models (modified Breusch-Pagan
test, Text S3) (Wooldridge, 2013). Correction for heteroskedasticity is
possible by weighing each population by the number of genotyped
individuals multiplied by the average number of genotyped loci per
individual, because the standard deviations of the error terms are
expected to scale linearly with the square-root of ‘within-group sample
size’. However, we took a slightly different approach by weighing the
regressions (of average allele frequency across loci against latitude)
by the sum of the square-roots of the number of genotyped individuals
per population (i.e., by \(\sum_{l=1}^{n}\sqrt{g_{l}}\), where\(g_{l}\) is the number of genotyped individuals at locus l , withl = 1,…,17). This is more appropriate here instead of the
square-root of the sums because it gives more weight to the number of
genotyped loci rather than to the number of genotyped individuals. In
this way, relatively low weight was given to the Caprivi Strip
population despite its large sample size (two genotyped loci, 134
genotyped individuals). Also, we weighted by square-root per locus,
because the adjusted R 2 values were strongly
positively biased by the relatively large sample sizes of the two KNP
subpopulations. Besides significance in the Breusch-Pagan test, the
effect of heteroskedasticity was also evident from the increased
adjusted R 2 values relative to the unweighted
regressions. The regressions were conducted with SPSS 23.
As a control, we also estimated the allele-frequency clines of the
remaining alleles from southern KNP observed ≥15 times that were not
associated with male-deleterious traits, again after pooling frequencies
of individual alleles. We based these control clines on the alleles that
were closest in size (number of short tandem repeats) to the non-pooled
male-deleterious-trait associated alleles. Further, where possible, we
selected the same number of non-pooled alleles per locus as with the
male-deleterious-trait-associated alleles (Table S1). Only at three loci
was the number of selected remaining control alleles smaller than the
comparison alleles (BM3517 and TGLA227 : one instead of two
alleles; BM1824 : two instead of four alleles).
In constructing the allele-frequency clines, we used an updated
definition of DE and SAE alleles according to a recent genetic study in
HiP (i.e., pooling all alleles with a relatively high frequency in
BTB-positive males with low body condition) (van Hooft et al., 2019),
rather than the one applied for the allele-frequency clines in KNP (van
Hooft et al., 2014). Also with the new definition, a significant cline
was observed in KNP for DE and SAE alleles combined and for DE alleles
specifically, but not for SAE alleles (DE loci: adjustedR 2 = 0.23, P = 0.0042; SAE loci:
adjusted R 2 = 0.04, P = 0.15; all loci:
adjusted R 2 = 0.23, P = 0.0040;
frequency against latitude, weighted by square-root of number of sampled
individuals per herd; Figures S2 and S3). However, the Spearman
correlation between average SAE allele frequency and latitude, when
weighted by square-root of sample size, was close to significance
(ρ = -0.36, P = 0.054).
In the second test (attribution of allele-frequency clines to
selection), we estimated the significance of the average per-locus
Pearson correlation between allele frequency and latitude, with the
individual per-locus correlations weighted by the square-root of the
number of genotypes per population. In the third test (increased allele
frequencies across whole latitudinal range), we estimated whether
average frequencies per locus were significantly higher for the DE and
SAE alleles than for the control alleles, with the per-locus frequencies
weighted by the square-root of the number of genotypes per population.
In both tests, randomized pooled frequencies were obtained by replacing
the non-pooled male-deleterious-trait-associated alleles (as mentioned
before, DE and SAE alleles consisted of pools of
male-deleterious-trait-associated alleles) with a random selection of
non-pooled alleles (73 alleles observed ≥15 times in southern KNP,
consisting of 33 male-deleterious-trait-associated alleles and 40
remaining alleles). We estimated significance as the probability that a
random selection of non-pooled alleles resulted in a stronger average
Pearson correlation per locus (Test 2) or a higher average frequency per
locus (Test 3) than the male-deleterious-trait-associated alleles. We
applied 100,000 randomizations implemented using Excel 2016.