To help account for the uncertainty in ancestry inference we sampled chromosome paintings 10 times for each individual. We use a binomial likelihood to test whether the ancestry at each SNP along the genome differs from expectations. Specifically, for each recipient haplotype, we infer the expected proportion of ancestry from each non-self ancestry region from their genome-wide ancestry paintings. At each position, we fit a deviation parameter, \(\beta\), that best explains the sampled ancestries at that SNP across all individuals. \(\beta\) can be thought of as the increase (or decrease) in expected ancestry proportions across individuals on the logit scale that best explains the proportions observed at a SNP compared to the genome-wide average. To generate a \(P\) value, we compared the likelihood ratio test statistic to a \(\chi^{2}\) distribution. At each SNP in the genome, this test gives a \(P\) value and estimate of \(\beta\) for each of the 7 non-self ancestries tested.
Modelling ancestry in this way has the advantage that it adjusts for an individual’s specific ancestry proportions. The likelihood is a product across individuals which assumes that individuals are independent given the underlying ancestry proportions. Quantile-quantile (Q-Q plots) of the observed test statistics typically show high levels of inflation [Fig. ]\ref{452754}. There are at least three explanations for this inflation. Firstly, due to shared ancestral histories among haplotypes from the same population, the paintings are unlikely to be independent between samples in a population, so the assumption in the likelihood that the samples are independent is likely to be false. Moreover the demographic process that has given rise to the ancestry paintings is more complex than the model, leading to local variations in paintings that are not captured by the null. Secondly, the chromosome painting model explicitly uses correlations in ancestry locally in the genome (i.e. linkage disequilibrium) meaning that tests are not independent along the genome, as shown by the autocorrelation plots in Figure \ref{452754}. Finally, adaptive introgression may have a weak effect genome-wide. We discuss below alternative approaches that model the variance (as well as the expectation) in ancestry proportions across the genome. However, these limitations mean that it is difficult to estimate a genome-wide threshold for significance in the traditional sense (i.e. for defining false positive/negative rates), so we interpret \(P\) values as providing a relative ranking rather than an absolute measure of significance.

Ancestry deviations in the Fula identify known targets of selection

We illustrate our approach with an analysis of genomes from the Fula ethnic group from The Gambia. The Fula (also known as Fulani) are historically nomadic pastoralists spread across West Africa who we inferred in our previous analysis  to have experienced an admixture event involving a largely Eurasian source (0.19 admixture proportion) mixing with a West African source around 1,800 years ago (239CE (95% CI = 199BCE-486CE)). (Admixture events across all analysis populations inferred previously \citep{Busby2016AdmixtureAfrica} are summarised in Figure \ref{fig:admOverview}). This event introduced a significant amount of Eurasian ancestry into the Fula which is relatively easy to identify given that Eurasian populations split and drifted from African populations over 50kya \citep{Mallick2016ThePopulations}. They thus have haplotypes that are easier to differentiate from each other, making our approach relatively well powered to detect deviations in ancestry proportions.
In the Fula, the region of the genome with the lowest proportion of Eurasian ancestry (\(-log_{10}P=10\), \(\beta\)=-4 across all Fula individuals) is on chromosome 1 and contains the Duffy Antigen Receptor DARC gene [Fig. \ref{fig:fig2}a] whilst the region with the highest level of Eurasian ancestry (\(-log_{10}P=15\), \(\beta\)=2) is on chromosome 2 and contains the LCT and MCM6 genes [Fig. \ref{fig:fig2}b]. Polymorphisms in DARC form the basis of the Duffy blood group system, and the Duffy null mutation, which provides resistance to Plasmodium vivax malaria \citep{Miller1976TheBlacks} is almost fixed across Africa and absent outside \citep{Howes2011TheGroup}. Our observation less Eurasian than expected at this locus suggests that African haplotypes have been beneficial at this locus after the admixture event that brought the Eurasian ancestry into the Fula; that is, within the last 2,000 years. Similar effects at this locus have been observed before: based on an analysis of allele frequencies, selection following admixture between Austronesians and sub-Saharan Bantus is likely to have driven frequencies of the Duffy null mutation to higher than expectations based on admixture proportions in the Malagasy of Madagascar \citep{Hodgson2014NaturalMadagascar}, and a recent study of Sahel populations showed an excess of African ancestry at this locus in Sudanese Arabs and Nubians \citep{Triska2015ExtensiveBelt}.
Mutations in an intron in MCM6 , a regulatory element for LCT lead to the lactose persistence phenotype \citep{Ingram2008LactosePersistence} (the ability to digest milk into adulthood), and represent one of the clearest signals of recent natural selection in the genome. Lactose persistence has evolved at least twice independently \citep{Bersaglieri2004GeneticGene}, in Europe and in East Africa \citep{Ranciaro2014GeneticAfrica}. Encouragingly, our analysis identifies this Eurasian haplotype, despite the SNP set not containing the ‘European’ lactase variant [13910 C\(>\)T polymorphism, rs4988235; Fig. \ref{fig:suppfulalct}] and suggests that the European haplotype has entered the Fula as a result of gene-flow from Europe approximately 2kya. In both this case and that of DARC described above, our ancestry based analysis provides new insight into the potential origins of selected mutations.