Introduction
Local adaptation occurs when individuals from a given population exhibit higher fitness in their local environment than in other environments (Kaweki & Ebert, 2004). Because habitats are spatially and temporally variable, local environmental conditions determine which traits may be favoured by selection (Hoban et al., 2016), ultimately leading to divergent selection at the phenotypic and genotypic levels, resulting in local adaptation (Kaweki & Ebert, 2004). However, rapid environmental change generated by global climate change and other anthropogenic effects directly impact local environments and the locally adapted individuals inhabiting those changing environments (Atkins & Travis, 2010; Valladares et al., 2014). Consequently, environmental change may decouple locally adapted allele frequencies from the current environmental conditions. Thus, assessing the genetic signatures of local adaptation in natural populations is critical for quantifying the scope of effects of changing environments on locally adapted populations (Aitken & Whitlock, 2013; Canosa et al., 2020; Lancaster et al., 2022).
Molecular genetics allows the incorporation of genetic evidence into the conservation and management of individuals, populations, and species across diverse taxonomic groups (Kirk & Freeland, 2011). Neutral molecular genetic markers are widely used to quantify genetic diversity, gene flow, and genetic differentiation among populations (Ouborg et al., 2010; Zimmerman et al., 2019); however, such data cannot inform conservation managers about potentially locally adapted or functional genetic variation. For example, the characterization of variation at functional loci (i.e., the genes that code for specific proteins) among populations provides insight into adaptive divergence (Luikart et al., 2003; Beaumont & Balding, 2004). Divergence in functional genotypes is expected to evolve relatively rapidly in response to natural selection, contrary to evolution by genetic drift alone (Kawecki & Ebert; 2004). The ideal analysis is thus a combination of rapidly evolving neutral genetic markers (e.g., microsatellite DNA, mtDNA sequence variation) and genetic polymorphisms linked to known-function genes (e.g., single nucleotide polymorphisms, or SNPs) that would characterize both gene flow and potential patterns of local adaptation.
Birds vary widely in their migratory patterns; from short-distance movements within seasons, to long-distance migrations covering substantial portions of the globe (Sekercioglu, 2007; Rolland et al., 2014). While such migratory life histories make them interesting candidate species for local adaptation analyses, there is limited information on patterns of genetic divergence that underlie the process of local adaptation (Kawecki & Ebert, 2004). Curiously, even though migratory birds are highly impacted by environmental changes (Both et al., 2006; Jonzén et al., 2006; Visser et al., 2015), little is understood about their genetic diversity or adaptive capacity, especially regarding the extent to which genomic variation is shaped by local environmental factors (Bay et al., 2021). Kuhn et al. (2013) used microsatellite, mitochondrial DNA, and a Clock gene marker in extant and historical populations of the pied flycatcher (Ficedula hypoleuca ), a long-distance migratory passerine, to test for potential effects of global climate change on their genetic structure. They showed stabilizing selection at the functional marker and suggested local adaptation had a greater effect on population genetic structure than recent climate change. Lehtonen et al. (2012) SNP-genotyped the same species across 17 sites across their breeding range and showed two (follistatin and SWS1 opsin ) of fourteen candidate genes involved in plumage colouration exhibited adaptive divergence – one of the few published studies of migratory passerines that quantified genetic diversity and differentiation using SNPs. To our knowledge, there has only been one published study of selection at genetic marker loci in an Arctic-breeding passerine; Tigano et al. (2017) showed that adaptation to migratory routes, or some other non-breeding ground-based environmental factor, drove the pattern of differentiation at genome-wide SNP markers in thick-billed murres (Uria lomvia ). Patterns of population differentiation in migratory bird species in general, and more specifically, in Arctic migratory birds, have been vastly understudied, despite the importance of population connectivity for their conservation in rapidly changing environments (Gu et al., 2021; Macdonald et al., 2012).
Snow buntings (Plectrophenax nivalis) are small, arctic-breeding passerines with a circumpolar distribution (Montgomerie & Lyon, 2020). Despite this species’ global distribution, there are life-history differences among populations (e.g., migratory versus non-migratory; see Table 1). Most snow bunting populations migrate seasonally between high Arctic breeding grounds and temperate wintering grounds (Macdonald et al., 2012; Snell et al., 2018); however, some populations are non-migratory (e.g., Aleutian and Pribilof islands - Alaska, USA, and a high-altitude Scottish population; Montgomerie & Lyon, 2020). While globally abundant, census data suggest North American snow bunting populations have undergone substantial declines, with a reduction of 64% over the past four to five decades (Butcher & Niven, 2007). However, conservation efforts are hampered by logistic and data limitations, including a lack of information on the population structure and selection pressures on the birds.
To address population structure and functional divergence consistent with local adaptation, we assessed global population structure among six geographically widespread breeding snow bunting populations. We first used microsatellite (presumed neutral) and transcriptome-derived SNP markers at known-function loci (potentially functional) to determine genetic divergence and hence assess population reproductive isolation. We then investigated population genetic divergence at functional marker loci, controlling for the effects of genetic drift using the neutral microsatellite marker data. As a largely migratory species, snow buntings are expected to have widely dispersed breeding populations across the globe (Montgomerie & Lyon, 2020), although current limited data suggest generally consistent migratory patterns (Lyngs, 2003; Macdonald et al., 2012; Snell et al., 2018; Montgomerie & Lyon, 2020). Hence, we predict some degree of reproductive isolation among the six breeding populations based on the expectation of consistent and separate migration routes; however, we recognized that the lack of explicit migratory data may result in unexpected gene flow and hence connectivity among some populations. We also predicted strong local selection pressures at the breeding grounds to result in patterns of local adaptation that would contribute to genetic differentiation at functional, candidate-gene loci under selection to improve local reproductive fitness. Specifically, we hypothesized that snow buntings are adapted to the local conditions on their breeding grounds, because selection pressures are strongest during the breeding period due to the high energetic demands of breeding, a short breeding season, and a correlation between local and regional climate and reproductive success (Falconer et al., 2008, Fossøy et al., 2014, Hoset et al., 2014). Although we expect patterns of local adaptation at some SNP marker loci, we predicted that the majority of our chosen functional gene SNPs would have neutral divergence patterns (consistent with genetic drift) with a few key SNPs exhibiting divergent and stabilizing selection. In this study we apply powerful population genetic approaches that can be used in future studies to facilitate the effective conservation and management of migratory species with the goal of facilitating the preservation of biodiversity.
Methods
This project included the development and application of two types of molecular genetic markers: neutral microsatellite markers and functional gene locus SNP markers. It thus involved two types of samples: RNA samples for de-novo transcriptome assembly for SNP marker development, and DNA samples collected across the global breeding range of snow buntings for genotype data for the population genetic analyses. The population genetic study involved genotyping all samples at both microsatellite and SNP locus markers to determine population genetic divergence and patterns of functional divergence.
Development of microsatellite markers
To develop snow bunting-specific microsatellite markers, multiple heterospecific primers were screened, and primers were chosen for strong amplification and high polymorphism on test samples (specifically, Mitivik Island DNA were used as a high-quality benchmark DNA for primer optimization). Some primer sequences were modified using the species-specific sequence information from an unrelated High Throughput sequencing project.
DNA sample collection and extraction
A large-scale collaborative effort collected snow bunting tissue from populations across a wide geographic range, resulting in a total of 221 samples from six populations for DNA extraction (Figure 1, Table 2). All bird handling and sample collection was conducted under appropriate animal care permits (see Table 2). With the exception of the samples from Utqiagvik (Barrow), AK, USA, which were DNA extracted using a QIAamp DNA Mini Kit (Qiagen Inc., Toronto, ON, Canada), all samples were extracted using a solid phase reversible immobilization (SPRI) bead extraction originally optimized for bird cloacal and oral swabs (Vo & Jedlicka, 2014). Briefly, tissue was incubated in a solution containing lysis buffer, protein precipitation solution and zirconia-silica beads, followed by two rounds of homogenization and extraction of DNA from the supernatant using SPRI beads. Rather than using 200uL of lysis buffer for tissue digestion as per the original protocol, our samples (e.g., small piece of dry blood spot on filter paper for Alert and Mitivik Island samples, dried pellet containing approximately 10mg of packed red blood cells for Svalbard samples, and a grain-of-rice-sized skin tissue sample from Aleutian Islands and Pribilof Islands) were digested in 200uL of digestion buffer (100mM NaCl, 50mM Tris-HCl pH 8.0, 10mM EDTA, 0.5% SDS) and 10uL of 20mg/mL proteinase K overnight at room temperature on a nutator. We did not include zirconia-silica beads for the homogenization step as we had soft tissue samples not requiring cellular disruption. The resulting genomic DNA was suspended in 50uL TE buffer and stored at -80°C until use.
RNA sample collection, extraction and sequencing
Sixteen snow buntings were chosen at random for RNASeq from a pool of individuals housed at the avian facility of Université du Québec à Rimouski (UQAR), QC, Canada. These individuals were captured near Rimouski, QC, Canada as wintering birds. All individuals used in the current study were humanely euthanized via cervical dislocation for a separate sequencing project (approved by Animal Care Committee at UQAR (CPA-61-15-163 and CPA-68-17-186)), their whole brain was collected and immediately preserved in a high concentration salt buffer (Final concentrations: 70g ammonium sulfate/100mL, 25mM sodium citrate, 20mM EDTA, pH 5.2), stored at -20°C, and transferred to -80°C until RNA extraction. The sampling of the 16 individuals was equally spaced out from early March to the end of April 2018 to maximize transcriptome diversity in the brain tissue samples.
Total RNA was extracted from brain tissue using TRIzol Reagents (Life Technologies, Mississauga, ON, Canada) according to the manufacturer’s protocol. The RNA pellet was resuspended in Nuclease-Free Water (Thermo Fisher Scientific, Mississauga, ON, Canada) and RNA quality was assessed using the Eukaryotic RNA 6000 Nano assay on a 2100 Bioanalyzer (Agilent Technologies Canada Inc., Mississauga, ON, Canada). We ensured that all samples had RIN > 8.5 and a 28S/18S rRNA ratio > 0.8 when preparing the RNA for sequencing for all sixteen birds. Final RNA aliquots were sent to the Genome Quebec Innovation Centre (McGill University, Montreal, QC, Canada) for 100bp paired-end sequencing in two lanes of an Illumina HiSeq4000 sequencer (Illumina Inc., San Diego, CA, USA).
RNA sequence analyses
Following sequencing, rRNA sequence reads were removed from the total raw sequence reads using SortMeRNA v2.1 (Kopyloca et al., 2012). Non-rRNA reads were then quality filtered using the default sliding window algorithm in Trimmomatic v0.38 (Bolger et al., 2014) to remove low-quality and adapter sequences. A de-novo transcriptome was assembled using fourteen out of sixteen samples using the default parameters with Trinity v2.8.4 (Hass et al., 2013) which includedin-silico normalization for all reads. In the absence of a reference genome, and to ease the computational load for downstream data processing, the final reference transcriptome was assembled with only the longest isoform per transcript. Cleaned RNA sequence reads from all sixteen individuals were mapped to the final reference transcriptome using Burrow’s Wheeler Alignment (BWA) v0.7.12 (Li & Durbin, 2009) (Supplementary Material S1). Additionally, we assigned Read Group tags to all samples as unique sample IDs for each file. Resulting SAM files were converted to BAM files and sorted using SAMtools v1.3 (Li et al., 2009). We then removed PCR duplicates using Picard Tools (http://broadinstitute.github.io/picard), the final BAM files were merged, and low-quality mapping and supplemental alignments were removed with SAMtools v1.3 (Li et al., 2009).
SNP characterization and SNP marker development
The mapping information for all reads from the de-novo assembled reference transcriptome was used for nucleotide variant discovery using the Broad Institute’s Genome Analysis Tools Kit (GATK) pipeline (DePristo et al., 2011; Van der Auwera et al., 2013) to characterize and develop functional gene locus SNPs. We performed quality recalibration, indel realignment and variant discovery on filtered-merged combined sequences, post-alignment, using GATK v4.1.7.0 (McKenna et al., 2010). Furthermore, we applied hard filtering parameters recommended for RNASeq experiments to detect variants (DePristo et al., 2011; Van der Auwera et al., 2013).
We used GeneMarkS-T (Besemer et al., 2001) to characterize open reading frames in our reference transcriptome and used SNPEff (Cingolani et al., 2012) to annotate variants and characterize them as missense, synonymous, upstream or downstream variants. We used the Trinotate pipeline (Bryant et al., 2017) to annotate all genes in our reference transcriptome and used LEMONS software (Levin et al., 2015) to predict intron splice junctions. It was important for us to identify the exon/intron boundaries to ensure that the SNP primers did not span introns since our goal was to use these primers to amplify genomic DNA.
By combining the SNPs (i.e., missense, synonymous, upstream or downstream variants) with gene annotation and predicted splice junction information, we were able to identify 11,378 useable SNPs (see Supplementary Material S2). From those, we selected 192 SNP loci representing genes expected to show local selection effects among our six populations. Broadly, seven gene function categories (energetics, lipid metabolism, immune response, stress response, nervous system development, reproduction and cell-housekeeping processes) were selecteda priori based on their relevance and importance for the our study species (justifications for gene categories are shown in Supplementary Material S3, gene function categories for selected loci shown in Supplementary Material S4). We designed SNP primers to amplify a 100bp-150bp region surrounding the SNP of interest for the 192 loci using default settings with Primer3 v4.1.0 (Untergasser et al., 2012). Forward and reverse universal adapters (ACCTGCCTGCC & ACGCCACCGAGC, respectively) were added to the 5’ end of the designed primers to allow for the addition of sequencing adapters and sample-specific barcodes for High Throughput Sequencing (HTS). All primers were tested in 12.5uL reactions containing 20mM Tris-HCl pH 8.0, 10mM KCl, 10mM (NH4)2SO4, 2mM MgSO4, 0.1% Triton X‐100, 0.1mg/mL bovine serum albumin (BSA), 200 µM of each dNTP, 200nM of forward and reverse primers, 0.5U of Taq polymerase (Bio Basic Canada Inc., Markham, ON, Canada), and 0.5uL of genomic DNA. The PCR cycling conditions were: 2 min at 95°C; 20s at 95°C, 20s at 58°C, 30s at 72°C (32 cycles); and 2 mins at 72°C. Of the 192 primer sets, 72 either did not amplify genomic DNA, yielded non-specific amplification or produced an amplicon larger than 350bp: all of these were discarded from subsequent analyses. Details for the remaining 117 SNP primers are provided in Supplementary Material S4 in Supplementary Data.
Microsatellite and SNP marker genotyping
Microsatellite DNA marker data were used to assess population genetic structure (population connectivity), and they were used as the neutral marker controls for assessing divergence at the SNP loci. Briefly, all DNA samples were amplified at nine microsatellite loci with three PCR reactions: i) a first round of 20-cycle multiplex PCR (all primers combined) for preamplification of the DNA (this was done due to the small amount of DNA recovered from some samples) followed by ii) a second round of 30-cycle PCR with individual microsatellite primers, and iii) a final round of 5-cycle PCR to add fluorescent tags for fluorescence-based capillary electrophoresis. For each individual, we conducted the multiplex PCR in a 5uL reaction mixture containing 2.5uL of 2x Multiplex PCR Master mix (Qiagen Inc., Toronto, ON, Canada), 0.5uL of primer pool (10x primer mix containing 2uM each of all 9 primer pairs), and 1.0uL each of RNase-Free Water and template DNA. The amplification conditions were: 5 min at 95°C followed by 20 cycles of 30s at 95°C, 90s at 57°C, 30s at 72°C; and ending with 30 mins at 60°C. We diluted the PCR products 20-fold by adding 95uL of ddH20. For the second round PCR, we amplified 2-4uL of the diluted multiplexed PCR product in a single-PCR reaction of 25uL which contained 2.5uL of 10x Taq buffer (20mM Tris-HCl pH 8.0, 10mM KCl, 10mM (NH4)2SO4; Bio Basic Canada Inc., Markham, ON, Canada), 200uM each of dNTP, MgSO4 (2uM), forward and reverse primers (2uM each), and 0.5U of Taq Polymerase (Bio Basic Canada Inc., Markham, ON, Canada). Thermocycling conditions were 95°C for 2 min; followed by 30 cycles of 95°C for 20s, locus-specific annealing temperature for 20s (56°C for CAM17, Lox8, Indigo29, SNBU682, and SNBU705; 58°C for Cuu28, POCC6, Ecit2, and CAM17), and 72°C for 30s, ending with 72°C for 2 min. For the final round of PCR, we used a PCR-based labelling technique where products from 1-4 loci were labelled with different dyes (6FAM, VIC, PET and NED; PCR conditions were identical to that of the second round of PCR with the exception of 5 cycles instead of 30) and combined with Hi-Di formamide (Applied Biosystems, Foster City, CA, USA) and a GeneScan LIZ600 size standard (Applied Biosystems, Foster City, CA, USA) for separation on a SeqStudio Genetic Analyzer (Applied Biosystems, Foster City, CA, USA). Each sample was genotyped using GeneMapper software v3.5 and verified by eye.
We genotyped all individuals at the selected SNP loci using HTS. The HTS library preparation was completed using two rounds of PCR; multiplex followed by barcoding (ligation) PCR. We first amplified the 117 SNP loci using five separate multiplex PCRs for each sample (bird). Each multiplex PCR included 17-25 primer pairs (SNP locus groups shown in Supplementary Material S4). Multiplex PCR used the Qiagen Multiplex PCR Plus Kit (Qiagen Inc., Toronto, ON, Canada). For each multiplex group, we first made 10x primer pools containing all primers within that group at equimolar concentration of 0.2uM. Each 7uL multiplex reaction contained 3.5uL Multiplex PCR Plus Master mix, 0.7uL of the 10x primer pool, 1.3uL ddH2O, and 1.5uL genomic DNA. The amplification conditions were: 5 min at 95°C followed by 28 cycles of 30s at 95°C, 90s at 58°C and 30s at 72°C followed by 10 mins at 68°C. We diluted the multiplexed PCR product 10-fold with ddH2O. Next, PCR products from each of the five multiplex reactions were pooled for each individual and cleaned using Sera-Mag Speed Beads (Cytiva, Mississauga, ON, Canada) to remove unincorporated dNTPs, primers, primer dimers and PCR buffers. We then ligated individual barcode sequences and HTS adaptor sequences to the PCR products in a second (ligation) short-cycle PCR. The 20uL PCR reaction included: 10x Taq buffer (20mM Tris-HCl pH 8.0, 10mM KCl, 10mM (NH4)2SO4; Bio Basic Canada Inc., Markham, ON, Canada), 2mM MgSO4, 0.1mg/mL bovine serum albumin (BSA), 200uM of each dNTP, 200nM of forward and reverse primers, 0.5 U of Taq polymerase (Bio Basic Canada Inc., Markham, ON, Canada), and 10uL of pooled and cleaned multiplex PCR product. The PCR conditions for the ligation PCR were: 94°C for 2 min, followed by 6 cycles of 94°C for 30s, 60°C for 30s and 72°C for 60s, followed by 72°C for 5 min. This second PCR ligated a “barcode” sequence that allowed us to identify each sample for allocating sequence data to specific individuals post-sequencing. The barcoded products were pooled and gel-extracted using the GenCatch Gel Extraction Kit (Epoch Life Science Inc., Sugar Land, TX, USA) as per manufacturer’s instructions. Purified pooled product was analyzed on an Agilent 2100 Bioanalyzer using a High Sensitivity chip (Agilent Technologies Canada Inc., Mississauga, ON, Canada) to verify the size and concentration of the library amplicons. Finally, the library was diluted to approximately 60pM and sequenced using Ion PGM Hi-Q chemistry in an Ion Chef System (Thermo Fisher Scientific Inc., Streetsville, ON, Canada). Specifically, the library was sequenced using an Ion 318 Chip Kit with an Ion PGM Sequencing 400 Kit (Thermo Fisher Scientific, Mississauga, ON).
SNP HTS Bioinformatics
After HTS sequencing of the pooled SNP PCR amplicons, we used the FASTX Toolkit (Gordon & Hannon, 2010) and its Barcode Splitter script to demultiplex the sequences. We then trimmed off the sequencing adapters and barcodes from all reads using CUTADAPT v1.11 (Martin, 2011) and subsequently mapped the resulting PCR-amplified sequences to our reference transcriptome using BWA v0.7.12 (Li & Durbin, 2009) to identify the genes containing the amplified SNP regions. To genotype all individuals at target SNP loci, we used FreeBayes (Garrison & Marth, 2012), a Bayesian genetic variant detector. Since FreeBayes detects many other variants such as small multi-nucleotide polymorphisms (MNPs), insertions and deletions (indels), composite insertions, and substitutions, we discarded such variants using VCFtools (Danecek et al., 2011). Next, we refined the VCF file through a filtration step which excluded the SNP locus markers that were called in less than 30% of individuals (16 out of 117 SNPs) and the individuals that were missing more than 10% of their genotypes (2 out of 221 individuals). Lastly, we only kept one SNP per amplicon (i.e., the original SNP used to design the primers for that amplicon) for further analyses to avoid any bias resulting from including multiple (linked) SNPs per amplicon. As a result, we had 101 SNP genotypes across 219 individuals for further analyses.
Population genetic analyses
Testing for temporal effects
Because we had individuals collected across multiple years for most of our study populations, we first tested for temporal effects (i.e., a year effect) on allele frequencies. We conducted separate Fisher’s exact tests of allele frequency variation for the microsatellite marker data for multi-year samples from Alert, Svalbard, Utqiagvik and Mitivik Island using the genepop package (Rousset, 2008) in R v1.2.5 (R Core Team 2016). Since p -values ranged from 0.08-0.50 for each population, we concluded that there were no temporal effects, hence we combined samples from multiple years for the Alert, Svalbard, Utqiagvik, and Mitivik Island populations.
Testing for Alaskan island population neutral divergence
The island populations of Alaska (Attu, Adak, and Pribilof islands) are geographically clustered (Figure 1), allowing dispersal among the islands possibly resulting in a single metapopulation. We thus tested these three populations for neutral population divergence. Based on the results of Fisher’s test, we combined Attu and Adak Island samples from 1999 forming the population ‘Aleutian Islands’ since there were no significant differences in neutral allele frequencies (p = 0.14). We retained Pribilof Islands individuals as a separate population for further analyses as it had a significantly different (p< 0.00001) neutral allele frequency distribution from the Aleutian Islands samples. These two Alaskan island populations combined with the other four populations, resulted in a total of six populations for downstream analyses (Table 2).
Population genetic divergence
We assessed population differentiation using both neutral microsatellite and functional SNP markers using pairwise Fisher’s exact test of allele frequency variation in the genepop package (Rousset, 2008) in R. We also estimated pairwise FST for both marker types using GENODIVE (version 3.0) (Miermans, 2020). We corrected allp -values for multiple comparisons using the sequential Bonferroni procedure (Rice, 1989) where necessary.
Neighbour-joining cluster analyses
To visually assess the pattern of population genetic divergence for the two marker types (microsatellite and SNP loci), we performed unrooted neighbour-joining (NJ) cluster analyses with Cavalli-Sforza and Edward’s (1967) chord distance (Dc ) using the ‘ape’ package (Paradis & Schliep, 2019) in R. Chord distance was used as it is expected to provide better tree topology estimation for closely related populations, although it may compromise branch length estimation (Angers & Bernatchez, 1998). The percent support for branches was estimated using bootstrapping, with replacement, among loci using 10,000 permutations in the ‘poppr’ package (Kamvar et al., 2014) in R.
Selection signatures at candidate loci
To detect a signature of selection at functional SNP loci, it is important to separate the effects of genetic drift from selection. For this purpose, we used the microsatellite markers to estimate the neutral allelic distribution; it is expected that both functional SNP loci and microsatellites undergo change due to genetic drift and gene flow, but only SNP loci are expected to be under selection due to potential local habitat-specific environmental conditions.