Analysis of linkage disequilibrium
Four microsatellite pairs showed significant LD in both KNP and HiP (van Hooft et al., 2019; van Hooft et al., 2014). Three of these occur close on chromosome 1 in cattle and its homologous chromosome in African buffalo (whole genome sequence data of cattle and contig sequence data of buffalo at NCBI website): BM1824 -CSMM019 (0.5 Mb in buffalo), CSSM01 9-BM3205 (5.4 Mb), BM1824 -BM3205 (5.9 Mb) (Ihara et al., 2004). The fourth pair consists of microsatellites ILSTS026 and INRA006 , which are probably ~18 Mb apart on the same chromosome in buffalo. This chromosome corresponds to a fusing of chromosomes 2 and 3 in cattle, with the former harbouring ILSTS026 and the latterINRA006 , in both cases ~9.5 Mb from the end of the left-side of their p arms (Text S4) (Gallagher & Womack, 1992; Goudet, 1995, 2001; Ihara et al., 2004; Lane-deGraaf et al., 2015; Liu & Muse, 2005; Weng, Saatchi, Schnabel, Taylor, & Garrick, 2014). Two additional microsatellite pairs are at close chromosomal distance, although they did not show significant LD in KNP or HiP:INRA006 -TGLA263 (27.5 Mb in buffalo, chromosome 3 in cattle) and TGLA057 -INRA128 (28.3 Mb, chromosome 1) (Ihara et al., 2004). The chromosomal distances at the aforementioned locus pairs correspond to recombination rates of 0.6-31%/generation based on a recombination rate of 1.1 cM/Mb at chromosomes 1 to 3 in cattle (Mouresan et al., 2019; Weng et al., 2014).
In this study, we conducted two statistical tests for LD, with LD estimated per population per locus pair, assuming two alleles per locus: the DE allele or SAE allele on the one hand and the wild-type allele (the pool of all original microsatellite alleles not associated with a male-deleterious trait, including the rare alleles) on the other. In these tests, we did not include the BM1824 -BM3205 pair, because this specific combination of microsatellites was only genotyped in KNP and HiP and because their LD would not be independent of theBM1824 -CSMM019 and CSMM019 -BM3205 locus pairs. Thus, the two LD tests were conducted with the following five locus pairs: BM1824 -CSMM019 (0.5 Mb),CSSM01 9-BM3205 (5.4 Mb), ILSTS026 -INRA006(~18 Mb), INRA006 -TGLA263 (27.5 Mb) andTGLA057 -INRA128 (28.3 Mb).
We measured LD by the Pearson correlation coefficient for a 2x2 table,r LD, for bi-allelic locus pairs. When the linkage phase is known, r LD can be derived from the observed and expected haplotype frequencies (χ 2value, four possible haplotypes) and the total number of gametes or chromosomes analysed (k ), using Equation 2 below (Charlesworth & Charlesworth, 2010). A negative sign is added tor LD when most pairs of focal alleles are in repulsion. However, when the linkage phase is unknown, as in our case,r LD can be derived from maximum-likelihood estimates of the haplotype frequencies (Schaid, 2004; Weir, 1990). In our case, positive r LD values indicate that most pairs of male-deleterious alleles are in coupling (i.e., DE-DE, SAE-SAE or DE-SAE).
\(r_{\text{LD}}=\ \sqrt{\frac{\chi^{2}}{k}}\) (2)
We estimated r LD with Genalex, following Weir’s method (Weir, 1990). Genalex provides three estimates ofr LD: 1) r LD based on estimated frequencies of double heterozygotes in coupling assuming Hardy-Weinberg equilibrium, 2) r LD based on estimated frequencies of double heterozygotes in repulsion assuming Hardy-Weinberg equilibrium, 3) r LD based on estimated frequencies of both types of double heterozygotes without assuming Hardy-Weinberg equilibrium. The first two estimates should be identical, but the maximum-likelihood procedure may result in deviations, especially with small sample sizes. We used ther LD estimates that assume HWE (i.e., estimates 1 and 2). We only included cases where estimates 1 and 2 were identical and of the same sign as estimate 3 (67 out of 78 cases), except for two cases where estimates 1 and 2 were of opposite sign but very close to zero (|r LD| =0.0002) while estimate 3 was zero. For these two cases we assumedr LD=0.
We pooled the samples from the neighbouring populations in Mana Pools NP-Nyakasanga (~55 km distance) and in Limpopo NP-Manguana (~200 km distance) to increase sample size, after checking that in the latter case this pooling, considering the large geographic distance, did not bias estimates ofr LD (paired sample t -test on 30 locus pairs, comparing weighted average r LD of the two separate populations with r LD of the pooled population: pooled population: average r LD = -0.010, two separation populations: average r LD = 0.002; P = 0.52; Pearson r = 0.91). It was not possible to estimate r LD for all populations and locus pairs because of small sample sizes. However, in some cases we were able to get an estimate of r LD by a leave-one-out approach (CSMM019 -BM3205 in Queen Elizabeth NP (both sectors) and Amboseli NP, BM1824 -CSMM019 in Limpopo NP-Manguana, INRA006 -TGLA263 in Mana Pools NP-Nyakasanga). Hereby, we calculated average r LD for all possible population samples with one individual excluded, with multiple estimates per population always being of the same sign. We excluded HiP because of earlier observed negative selection (van Hooft et al., 2019), the two central African populations because of small sample size, and the admixed population from Save Valley Conservancy because admixing may bias LD estimates (Pfaff et al., 2001).
In the first LD test, we hypothesized that pairs of male-deleterious-trait-associated alleles (i.e.; DE-DE, SAE-SAE and DE-SAE allele pairs) are coupled more often than expected with random association, resulting in positive r LD values. We conducted a χ 2 test to determine whether the total fraction of locus pairs (across loci and across populations) with a positive r LD value significantly deviated from 0.5, considering that positive and negative r LDvalues are equally likely in the absence of selection (Slatkin, 2008). LD estimates of the two closest linked locus pairs,BM1824 -CSMM019 and CSMM019 -BM3205 , for populations that are part of a larger metapopulation may not be independent of each other. To err on the conservative side, we used the average r LD value per metapopulation across both locus pairs (KNP was the only metapopulation in which both locus pairs were analysed). With respect to these locus pairs, the following four groups of populations were considered to be part a metapopulation: 1) northern KNP - southern KNP, 2) Malilangwe Wildlife Res. - Gonarezhou NP - Sengwe Safari Area - Crooks Corner, 3) Victoria Falls NP - Chobe NP - Okavango Delta, 4) Amboseli NP - Tsavo NP, 5) Queen Elizabeth NP Mw - Queen Elizabeth NP Is. We assumed independence ofr LD estimates for the other three physically linked locus pairs, even when part of the same metapopulation, because of their relatively large interlocus distances. For these locus pairs, the expected recombination rate (~20-31% per generation) likely exceeds the migration rate between nearby populations.
As a control, we estimated the total fraction of locus pairs with positive r LDacross all populations for: 1) the five closely linked locus pairs with genotypes randomized per locus per population, and 2) locus pairs with interlocus distance >45 Mb (seven locus pairs) or that occur on different chromosomes (116 locus pairs). Both fractions are expected to be close to 0.5 (free recombination). In the first control we conducted 30 randomizations of the total dataset, resulting in 1541 randomizedr LD values. The r LD values in the second control were estimated with GDA, instead of Genalex, because it can better handle large data sets with missing data (Labate, 2000). At the five closely linked locus pairs, Genalex and GDA always provided identical estimates.
In the second LD test, we estimated the significance of regression ofr LD for each population against the natural logarithm of the variable l =latitude (67 data points) over the range [-24.75°, 3.85°], i.e., we fitted the parameters a andb in the equation:
\(r_{\text{LD}}=\operatorname{aln}\left(l-\ l_{0}\right)+b\) (3)
using a general linear mixed model (GLMM) approach, specifically the ’lme4’ (version 1.1.21), ’lmerTest’ (version 3.1.0) and ’performance’ (version 0.4.4.1) packages in R (Bates, Machler, Bolker, & Walker, 2015; Kuznetsova, Brockhoff, & Christensen, 2017). We applied log-transformation to account for the obvious non-linearity in the data, using l 0 = -27. Locus pair (five levels), population (25 levels, including three singletons) and metapopulation (57 levels, including 52 singletons) were included as random factors. GLMMs were not weighted by sample size as this only had a minor effect on parameter estimates (< 0.5% difference). Goodness-of-fit was estimated with marginal R 2; the proportion of variability explained by the fixed effects (Nakagawa & Schielzeth, 2013).