Introduction
The source of adaptive genetic variation remains an important question in evolutionary biology \citep{Hedrick2013AdaptiveVariation}. In the canonical view of evolution, natural selection causes a change in the frequency of a mutation that was either present before the introduction of selection, so-called standing variation, or a new mutation that arose after the selective pressure began \citep{Hermisson2005SoftVariation}. A third potential source of novel genetic variation on which natural selection can act is via gene-flow resulting from an inter or intra species admixture event. Examples of such adaptive introgression include the exchange of mimicry adaptations amongst Heliconius butterfly species \citep{Consortium2012ButterflySpecies}, the transfer of insecticide resistance genes between sibling Anopheles mosquito species \citep{Clarkson2014AdaptiveIsolation}, and the spread of pesticide resistance across populations of house mice \citep{Staubach2012GenomeMusculus}.
In the human lineage, recent genomic studies have shown that our demographic history is complex, involving population merges as well as splits \citep{Green2010AGenome,Patterson2012AncientHistory,Prufer2014TheMountains,Hellenthal2014AHistory,Fu2015AnAncestor}. These merges, which are usually referred to as admixture or introgression events, open the door to the transfer of advantageous alleles. For example, we now know that trysts between archaic Homo sapiens and Neanderthals over fifty thousand years ago (kya) transferred beneficial alleles from Neanderthals into the human lineage \citep{Green2010AGenome,Patterson2012AncientHistory,Prufer2014TheMountains,Fu2015AnAncestor,Abi-Rached2011TheHumans,Gittelman2016ArchaicEnvironments}. While genomic regions with elevated archaic ancestry have been observed, it is also possible to identify regions that are devoid of archaic ancestry and where it is thought that archaic alleles at these loci have been at a disadvantage \citep{Sankararaman2014TheHumans,Racimo2015EvidenceHumans}.
Despite the prevalence of more recent admixture events over the past 50 kya of human history, relatively few instances of adaptation from these events have been found. Notable known examples include the exchange of high altitude adaptations between the ancestors of Sherpa and Tibetans \citep{Jeong2014AdmixtureTibet} and the spread of the Plasmodium vivax malaria-protecting Duffy null mutation in Madagascar as result of gene-flow from mainland Bantu-speaking Africans \citep{Hodgson2014NaturalMadagascar}.
Within Africa, multiple studies using different techniques have shown admixture to be a common theme in the recent history of the continent \cite{Pagani2012EthiopianPool,Schlebusch2012GenomicHistory,Pickrell2012TheAfrica,Pickrell2014AncientAfrica,Pickrell2014TowardDNA,Busby2016AdmixtureAfrica,Patin2017DispersalsAmerica,Carlton_2017,Amer_2017,Skoglund_2017}. As a result, the genomes of individuals from these populations contain segments which derive from multiple ancestries. Inferring whether gene-flow has assisted adaptation requires inferring how these ancestries change locally along the genome. Although a number of strategies exist for local ancestry inference \citep{Price2009SensitivePopulations,Baran2012FastPopulations,Brisbin2012PCAdmix:Populations,Maples2013RFMix:Inference}, most are designed to distinguish at best continental-level ancestry from a small number of reference populations.
The approach we develop here for inferring whether gene-flow has contributed more or less ancestry than expected at a locus involves sampling from a Hidden Markov Model to identify the likely donor haplotype from a large set of reference genomes. We use this chromosome painting approach to assign a donor ancestry label (which can be at individual, population, region, or continental level) at each locus in the genome to all recipient individuals. Iterating this process incorporates ancestry assignment uncertainty and we infer the ancestry proportions from different donor groups across the genome.
We use this analysis to ask a simple question: across individuals in these admixed populations, are there regions of the genome where ancestry is significantly deviated away from genome-wide expectations? We construct a statistical model to test for significant deviations and interpret deviations in local ancestry as possibly resulting from natural selection, which can act either by increasing the frequency of the introgressed haplotype (positive selection) or prevent it from replacing an established haplotype (negative selection). We highlight specific examples where ancestry deviations align with known targets of selection, describe patterns of selection across the data, and discuss the challenges in using approaches of this kind for detecting adaptive gene-flow.
Results and Discussion
Inferring local ancestry
We used a published dataset containing computationally phased haplotypes from 3,283 individuals, from 60 worldwide populations, typed at 328,176 high quality genome-wide SNPs [Fig. \ref{fig:supppopsmap}] \citep{Busby2016AdmixtureAfrica} and grouped individuals from these populations into 8 separate ancestry regions, based on genetic and ethno-linguistic similarity, as described previously \citep{Busby2016AdmixtureAfrica} [Fig. \ref{fig:fig1}a]. We painted chromosomes from these populations using ChromoPainter with donors which did not include any closely related populations, here defined as individuals from within their own ancestry region. Therefore, for each haplotype, there are seven non-local ancestries (or colours of paint) from which they can copy.
Identifying changes in local ancestry
Our approach for inferring ancestry deviations is based on the idea that ancestry tracts shared between two populations will reveal a signal of admixture \citep{Hellenthal2014AHistory}, but will be randomly distributed amongst the genomes of individuals within those populations \citep{Baird2006FishersAdmixture}[Fig. \ref{fig:fig1}b]. So, whilst individuals will have mosaic (i.e. block-like) ancestry [Fig. \ref{fig:fig1}c], when summed across all individuals in a population, ancestry proportions at specific loci (SNPs) will resemble genome-wide proportions in expectation [Fig. \ref{fig:fig1}d]. In this framework, we would expect ancestry proportions at a SNP to match genome-wide proportions, with some variation which we can model. A significant deviation away from expected ancestry proportions suggests that natural selection may have contributed to the change. We note that this will only ever allow for indirect inference of selection. This is nevertheless similar in spirit to other commonly used selection identification methods, such as the integrated haplotype score (iHS) \citep{Sabeti2002DetectingStructure,Voight2006AGenome} which scans the genomes of a population for longer than expected haplotypes based on their frequency, which are then (indirectly) inferred to have swept as a result of selection.