Identifying alleles important to acidic adaptation, and
quantifying their frequencies in marine stickleback
To identify alleles important to the adaptation of stickleback to acidic
habitats, we performed genome-wide differentiation mapping between the
acidic and basic sample pools. That is, we scanned the poolSeq SNPs for
positions exhibiting extremely high global differentiation between
stickleback from acidic versus basic lakes. The reason why we did not
define genetic variation important for acidic adaptation simply as SNPs
highly differentiated between acidic and marine fish is that this
would mostly have uncovered genetic variation important to
marine-freshwater divergence in general. Such variation is abundant in
North Uist stickleback (Figure S3 in Haenel et al. 2019; see also Jones
2012b; Roesti et al. 2014; Bassham et al. 2018; Fang et al. 2020;
Terekhanova et al. 2019). Our focus, however, was specifically on
genetic variation for which gene flow into marine fish must be rare and
geographically restricted. Acidic-basic differentiation was expressed by
the absolute allele frequency difference AFD. Positions qualified as
high-differentiation SNPs if they showed AFD equal or superior to 0.85,
were autosomal, and were physically separated by at least 100 kb to
ensure independence (tight linkage disequilibrium typically decays over
much shorter distances in stickleback, e.g., Roesti et al. 2015). With
these criteria, we obtained a panel of 50 ‘adaptive SNPs’, that is,
positions at which one allele appears strongly and consistently
selectively favored in acidic habitats. As a basis for comparison, we
analogously selected a panel of 500 ‘baseline SNPs’ from the same genome
scan. These latter polymorphisms were also required to be separated by
at least 100 kb, but to exhibit minimal differentiation (AFD within
0.1% of the genome-wide median) between the acidic and the basic pool.
The latter criterion ensured that these SNPs did not tag genome regions
(consistently) involved in acidic adaptation. At each of the adaptive
SNPs, we then defined the nucleotide predominant in the acidic pool as
the ‘acidic allele’, and determined and graphed the frequency of these
alleles in all six marine sample pools. An analogous analysis was
performed for the baseline SNPs, here defining the acidic allele as the
one relatively more common in the acidic than the basic pool. Our
prediction was that if genetic variation at the adaptive SNPs in marine
stickleback reflects gene flow-selection balance, the frequency of the
acidic alleles at these markers (but not at the baseline SNPs) should be
elevated in marine stickleback sampled on North Uist. As a resource, we
additionally compiled all genes located within a 100 kb window centered
at each adaptive SNP.
For three exemplary adaptive SNPs, we further visualized the diversity
and distribution of surrounding haplotypes among our samples based on
haplotype networks. The markers chosen included the adaptive SNP
exhibiting the strongest acidic-basic differentiation in the present
study (AFD = 0.96), the adaptive SNP tagging the genome region showing
the strongest acidic-basic differentiation in a previous investigation
(Figure 3A in Haenel et al. 2019), and the adaptive SNP located on a
known inversion polymorphism (Jones et al. 2012b; Roesti et al. 2015;
Haenel et al. 2019). Using the raw nucleotide counts derived from
indSeq, we performed individual diploid genotyping for all nucleotide
positions exhibiting a read depth of 10x or greater across a 5 kb window
centered on the adaptive SNPs, considering positions heterozygous if
their MAF was greater than 0.1. Individuals with >25%
missing genotypes were omitted. Based on the remaining data, positions
qualified as informative SNPs if they displayed <=40% missing
genotypes and a MAF of at least 0.05. The resulting genotype matrices
were subjected to phasing with fastPHASE v1.4.8 (Scheet & Stephens
2006; settings provided in Supplementary codes). Haplotype genealogies
were then constructed with RAxML v8 (Stamatakis 2014) and visualized as
haplotype networks in FITCHI (Matschiner 2016) (settings provided in
Supplementary codes).