Introduction
Local adaptation occurs when individuals from a given population exhibit
higher fitness in their local environment than in other environments
(Kaweki & Ebert, 2004). Because habitats are spatially and temporally
variable, local environmental conditions determine which traits may be
favoured by selection (Hoban et al., 2016), ultimately leading to
divergent selection at the phenotypic and genotypic levels, resulting in
local adaptation (Kaweki & Ebert, 2004). However, rapid environmental
change generated by global climate change and other anthropogenic
effects directly impact local environments and the locally adapted
individuals inhabiting those changing environments (Atkins & Travis,
2010; Valladares et al., 2014). Consequently, environmental change may
decouple locally adapted allele frequencies from the current
environmental conditions. Thus, assessing the genetic signatures of
local adaptation in natural populations is critical for quantifying the
scope of effects of changing environments on locally adapted populations
(Aitken & Whitlock, 2013; Canosa et al., 2020; Lancaster et al., 2022).
Molecular genetics allows the incorporation of genetic evidence into the
conservation and management of individuals, populations, and species
across diverse taxonomic groups (Kirk & Freeland, 2011). Neutral
molecular genetic markers are widely used to quantify genetic diversity,
gene flow, and genetic differentiation among populations (Ouborg et al.,
2010; Zimmerman et al., 2019); however, such data cannot inform
conservation managers about potentially locally adapted or functional
genetic variation. For example, the characterization of variation at
functional loci (i.e., the genes that code for specific proteins) among
populations provides insight into adaptive divergence (Luikart et al.,
2003; Beaumont & Balding, 2004). Divergence in functional genotypes is
expected to evolve relatively rapidly in response to natural selection,
contrary to evolution by genetic drift alone (Kawecki & Ebert; 2004).
The ideal analysis is thus a combination of rapidly evolving neutral
genetic markers (e.g., microsatellite DNA, mtDNA sequence variation) and
genetic polymorphisms linked to known-function genes (e.g., single
nucleotide polymorphisms, or SNPs) that would characterize both gene
flow and potential patterns of local adaptation.
Birds vary widely in their migratory patterns; from short-distance
movements within seasons, to long-distance migrations covering
substantial portions of the globe (Sekercioglu, 2007; Rolland et al.,
2014). While such migratory life histories make them interesting
candidate species for local adaptation analyses, there is limited
information on patterns of genetic divergence that underlie the process
of local adaptation (Kawecki & Ebert, 2004). Curiously, even though
migratory birds are highly impacted by environmental changes (Both et
al., 2006; Jonzén et al., 2006; Visser et al., 2015), little is
understood about their genetic diversity or adaptive capacity,
especially regarding the extent to which genomic variation is shaped by
local environmental factors (Bay et al., 2021). Kuhn et al. (2013) used
microsatellite, mitochondrial DNA, and a Clock gene marker in
extant and historical populations of the pied flycatcher (Ficedula
hypoleuca ), a long-distance migratory passerine, to test for potential
effects of global climate change on their genetic structure. They showed
stabilizing selection at the functional marker and suggested local
adaptation had a greater effect on population genetic structure than
recent climate change. Lehtonen et al. (2012) SNP-genotyped the same
species across 17 sites across their breeding range and showed two
(follistatin and SWS1 opsin ) of fourteen candidate genes
involved in plumage colouration exhibited adaptive divergence – one of
the few published studies of migratory passerines that quantified
genetic diversity and differentiation using SNPs. To our knowledge,
there has only been one published study of selection at genetic marker
loci in an Arctic-breeding passerine; Tigano et al. (2017) showed that
adaptation to migratory routes, or some other non-breeding ground-based
environmental factor, drove the pattern of differentiation at
genome-wide SNP markers in thick-billed murres (Uria lomvia ).
Patterns of population differentiation in migratory bird species in
general, and more specifically, in Arctic migratory birds, have been
vastly understudied, despite the importance of population connectivity
for their conservation in rapidly changing environments (Gu et al.,
2021; Macdonald et al., 2012).
Snow buntings (Plectrophenax nivalis) are small, arctic-breeding
passerines with a circumpolar distribution (Montgomerie & Lyon, 2020).
Despite this species’ global distribution, there are life-history
differences among populations (e.g., migratory versus non-migratory; see
Table 1). Most snow bunting populations migrate seasonally between high
Arctic breeding grounds and temperate wintering grounds (Macdonald et
al., 2012; Snell et al., 2018); however, some populations are
non-migratory (e.g., Aleutian and Pribilof islands - Alaska, USA, and a
high-altitude Scottish population; Montgomerie & Lyon, 2020). While
globally abundant, census data suggest North American snow bunting
populations have undergone substantial declines, with a reduction of
64% over the past four to five decades (Butcher & Niven, 2007).
However, conservation efforts are hampered by logistic and data
limitations, including a lack of information on the population structure
and selection pressures on the birds.
To address population structure and functional divergence consistent
with local adaptation, we assessed global population structure among six
geographically widespread breeding snow bunting populations. We first
used microsatellite (presumed neutral) and transcriptome-derived SNP
markers at known-function loci (potentially functional) to determine
genetic divergence and hence assess population reproductive isolation.
We then investigated population genetic divergence at functional marker
loci, controlling for the effects of genetic drift using the neutral
microsatellite marker data. As a largely migratory species, snow
buntings are expected to have widely dispersed breeding populations
across the globe (Montgomerie & Lyon, 2020), although current limited
data suggest generally consistent migratory patterns (Lyngs, 2003;
Macdonald et al., 2012; Snell et al., 2018; Montgomerie & Lyon, 2020).
Hence, we predict some degree of reproductive isolation among the six
breeding populations based on the expectation of consistent and separate
migration routes; however, we recognized that the lack of explicit
migratory data may result in unexpected gene flow and hence connectivity
among some populations. We also predicted strong local selection
pressures at the breeding grounds to result in patterns of local
adaptation that would contribute to genetic differentiation at
functional, candidate-gene loci under selection to improve local
reproductive fitness. Specifically, we hypothesized that snow buntings
are adapted to the local conditions on their breeding grounds, because
selection pressures are strongest during the breeding period due to the
high energetic demands of breeding, a short breeding season, and a
correlation between local and regional climate and reproductive success
(Falconer et al., 2008, Fossøy et al., 2014, Hoset et al., 2014).
Although we expect patterns of local adaptation at some SNP marker loci,
we predicted that the majority of our chosen functional gene SNPs would
have neutral divergence patterns (consistent with genetic drift) with a
few key SNPs exhibiting divergent and stabilizing selection. In this
study we apply powerful population genetic approaches that can be used
in future studies to facilitate the effective conservation and
management of migratory species with the goal of facilitating the
preservation of biodiversity.
Methods
This project included the development and application of two types of
molecular genetic markers: neutral microsatellite markers and functional
gene locus SNP markers. It thus involved two types of samples: RNA
samples for de-novo transcriptome assembly for SNP marker
development, and DNA samples collected across the global breeding range
of snow buntings for genotype data for the population genetic analyses.
The population genetic study involved genotyping all samples at both
microsatellite and SNP locus markers to determine population genetic
divergence and patterns of functional divergence.
Development of
microsatellite markers
To develop snow bunting-specific microsatellite markers, multiple
heterospecific primers were screened, and primers were chosen for strong
amplification and high polymorphism on test samples (specifically,
Mitivik Island DNA were used as a high-quality benchmark DNA for primer
optimization). Some primer sequences were modified using the
species-specific sequence information from an unrelated High Throughput
sequencing project.
DNA sample
collection and extraction
A large-scale collaborative effort collected snow bunting tissue from
populations across a wide geographic range, resulting in a total of 221
samples from six populations for DNA extraction (Figure 1, Table 2). All
bird handling and sample collection was conducted under appropriate
animal care permits (see Table 2). With the exception of the samples
from Utqiagvik (Barrow), AK, USA, which were DNA extracted using a
QIAamp DNA Mini Kit (Qiagen Inc., Toronto, ON, Canada), all samples were
extracted using a solid phase reversible immobilization (SPRI) bead
extraction originally optimized for bird cloacal and oral swabs (Vo &
Jedlicka, 2014). Briefly, tissue was incubated in a solution containing
lysis buffer, protein precipitation solution and zirconia-silica beads,
followed by two rounds of homogenization and extraction of DNA from the
supernatant using SPRI beads. Rather than using 200uL of lysis buffer
for tissue digestion as per the original protocol, our samples (e.g.,
small piece of dry blood spot on filter paper for Alert and Mitivik
Island samples, dried pellet containing approximately 10mg of packed red
blood cells for Svalbard samples, and a grain-of-rice-sized skin tissue
sample from Aleutian Islands and Pribilof Islands) were digested in
200uL of digestion buffer (100mM NaCl, 50mM Tris-HCl pH 8.0, 10mM EDTA,
0.5% SDS) and 10uL of 20mg/mL proteinase K overnight at room
temperature on a nutator. We did not include zirconia-silica beads for
the homogenization step as we had soft tissue samples not requiring
cellular disruption. The resulting genomic DNA was suspended in 50uL TE
buffer and stored at -80°C until use.
RNA sample
collection, extraction and sequencing
Sixteen snow buntings were chosen at random for RNASeq from a pool of
individuals housed at the avian facility of Université du Québec à
Rimouski (UQAR), QC, Canada. These individuals were captured near
Rimouski, QC, Canada as wintering birds. All individuals used in the
current study were humanely euthanized via cervical dislocation for a
separate sequencing project (approved by Animal Care Committee at UQAR
(CPA-61-15-163 and CPA-68-17-186)), their whole brain was collected and
immediately preserved in a high concentration salt buffer (Final
concentrations: 70g ammonium sulfate/100mL, 25mM sodium citrate, 20mM
EDTA, pH 5.2), stored at -20°C, and transferred to -80°C until RNA
extraction. The sampling of the 16 individuals was equally spaced out
from early March to the end of April 2018 to maximize transcriptome
diversity in the brain tissue samples.
Total RNA was extracted from brain tissue using TRIzol Reagents (Life
Technologies, Mississauga, ON, Canada) according to the manufacturer’s
protocol. The RNA pellet was resuspended in Nuclease-Free Water (Thermo
Fisher Scientific, Mississauga, ON, Canada) and RNA quality was assessed
using the Eukaryotic RNA 6000 Nano assay on a 2100 Bioanalyzer (Agilent
Technologies Canada Inc., Mississauga, ON, Canada). We ensured that all
samples had RIN > 8.5 and a 28S/18S rRNA ratio
> 0.8 when preparing the RNA for sequencing for all sixteen
birds. Final RNA aliquots were sent to the Genome Quebec Innovation
Centre (McGill University, Montreal, QC, Canada) for 100bp paired-end
sequencing in two lanes of an Illumina HiSeq4000 sequencer (Illumina
Inc., San Diego, CA, USA).
RNA sequence
analyses
Following sequencing, rRNA sequence reads were removed from the total
raw sequence reads using SortMeRNA v2.1 (Kopyloca et al., 2012).
Non-rRNA reads were then quality filtered using the default sliding
window algorithm in Trimmomatic v0.38 (Bolger et al., 2014) to remove
low-quality and adapter sequences. A de-novo transcriptome was
assembled using fourteen out of sixteen samples using the default
parameters with Trinity v2.8.4 (Hass et al., 2013) which includedin-silico normalization for all reads. In the absence of a
reference genome, and to ease the computational load for downstream data
processing, the final reference transcriptome was assembled with only
the longest isoform per transcript. Cleaned RNA sequence reads from all
sixteen individuals were mapped to the final reference transcriptome
using Burrow’s Wheeler Alignment (BWA) v0.7.12 (Li & Durbin, 2009)
(Supplementary Material S1). Additionally, we assigned Read Group tags
to all samples as unique sample IDs for each file. Resulting SAM files
were converted to BAM files and sorted using SAMtools v1.3 (Li et al.,
2009). We then removed PCR duplicates using Picard Tools
(http://broadinstitute.github.io/picard),
the final BAM files were merged, and low-quality mapping and
supplemental alignments were removed with SAMtools v1.3 (Li et al.,
2009).
SNP characterization
and SNP marker development
The mapping information for all reads from the de-novo assembled
reference transcriptome was used for nucleotide variant discovery using
the Broad Institute’s Genome Analysis Tools Kit (GATK) pipeline
(DePristo et al., 2011; Van der Auwera et al., 2013) to characterize and
develop functional gene locus SNPs. We performed quality recalibration,
indel realignment and variant discovery on filtered-merged combined
sequences, post-alignment, using GATK v4.1.7.0 (McKenna et al., 2010).
Furthermore, we applied hard filtering parameters recommended for RNASeq
experiments to detect variants (DePristo et al., 2011; Van der Auwera et
al., 2013).
We used GeneMarkS-T (Besemer et al., 2001) to characterize open reading
frames in our reference transcriptome and used SNPEff (Cingolani et al.,
2012) to annotate variants and characterize them as missense,
synonymous, upstream or downstream variants. We used the Trinotate
pipeline (Bryant et al., 2017) to annotate all genes in our reference
transcriptome and used LEMONS software (Levin et al., 2015) to predict
intron splice junctions. It was important for us to identify the
exon/intron boundaries to ensure that the SNP primers did not span
introns since our goal was to use these primers to amplify genomic DNA.
By combining the SNPs (i.e., missense, synonymous, upstream or
downstream variants) with gene annotation and predicted splice junction
information, we were able to identify 11,378 useable SNPs (see
Supplementary Material S2). From those, we selected 192 SNP loci
representing genes expected to show local selection effects among our
six populations. Broadly, seven gene function categories (energetics,
lipid metabolism, immune response, stress response, nervous system
development, reproduction and cell-housekeeping processes) were selecteda priori based on their relevance and importance for the our
study species (justifications for gene categories are shown in
Supplementary Material S3, gene function categories for selected loci
shown in Supplementary Material S4). We designed SNP primers to amplify
a 100bp-150bp region surrounding the SNP of interest for the 192 loci
using default settings with Primer3 v4.1.0 (Untergasser et al., 2012).
Forward and reverse universal adapters (ACCTGCCTGCC & ACGCCACCGAGC,
respectively) were added to the 5’ end of the designed primers to allow
for the addition of sequencing adapters and sample-specific barcodes for
High Throughput Sequencing (HTS). All primers were tested in 12.5uL
reactions containing 20mM Tris-HCl pH 8.0, 10mM KCl, 10mM
(NH4)2SO4, 2mM
MgSO4, 0.1% Triton X‐100, 0.1mg/mL bovine serum albumin
(BSA), 200 µM of each dNTP, 200nM of forward and reverse primers, 0.5U
of Taq polymerase (Bio Basic Canada Inc., Markham, ON, Canada), and
0.5uL of genomic DNA. The PCR cycling conditions were: 2 min at 95°C;
20s at 95°C, 20s at 58°C, 30s at 72°C (32 cycles); and 2 mins at 72°C.
Of the 192 primer sets, 72 either did not amplify genomic DNA, yielded
non-specific amplification or produced an amplicon larger than 350bp:
all of these were discarded from subsequent analyses. Details for the
remaining 117 SNP primers are provided in Supplementary Material S4 in
Supplementary Data.
Microsatellite and
SNP marker genotyping
Microsatellite DNA marker data were used to assess population genetic
structure (population connectivity), and they were used as the neutral
marker controls for assessing divergence at the SNP loci. Briefly, all
DNA samples were amplified at nine microsatellite loci with three PCR
reactions: i) a first round of 20-cycle multiplex PCR (all primers
combined) for preamplification of the DNA (this was done due to the
small amount of DNA recovered from some samples) followed by ii) a
second round of 30-cycle PCR with individual microsatellite primers, and
iii) a final round of 5-cycle PCR to add fluorescent tags for
fluorescence-based capillary electrophoresis. For each individual, we
conducted the multiplex PCR in a 5uL reaction mixture containing 2.5uL
of 2x Multiplex PCR Master mix (Qiagen Inc., Toronto, ON, Canada), 0.5uL
of primer pool (10x primer mix containing 2uM each of all 9 primer
pairs), and 1.0uL each of RNase-Free Water and template DNA. The
amplification conditions were: 5 min at 95°C followed by 20 cycles of
30s at 95°C, 90s at 57°C, 30s at 72°C; and ending with 30 mins at 60°C.
We diluted the PCR products 20-fold by adding 95uL of
ddH20. For the second round PCR, we amplified 2-4uL of
the diluted multiplexed PCR product in a single-PCR reaction of 25uL
which contained 2.5uL of 10x Taq buffer (20mM Tris-HCl pH 8.0, 10mM KCl,
10mM (NH4)2SO4; Bio
Basic Canada Inc., Markham, ON, Canada), 200uM each of dNTP,
MgSO4 (2uM), forward and reverse primers (2uM each), and
0.5U of Taq Polymerase (Bio Basic Canada Inc., Markham, ON, Canada).
Thermocycling conditions were 95°C for 2 min; followed by 30 cycles of
95°C for 20s, locus-specific annealing temperature for 20s (56°C for
CAM17, Lox8, Indigo29, SNBU682, and SNBU705; 58°C for Cuu28, POCC6,
Ecit2, and CAM17), and 72°C for 30s, ending with 72°C for 2 min. For the
final round of PCR, we used a PCR-based labelling technique where
products from 1-4 loci were labelled with different dyes (6FAM, VIC, PET
and NED; PCR conditions were identical to that of the second round of
PCR with the exception of 5 cycles instead of 30) and combined with
Hi-Di formamide (Applied Biosystems, Foster City, CA, USA) and a
GeneScan LIZ600 size standard (Applied Biosystems, Foster City, CA, USA)
for separation on a SeqStudio Genetic Analyzer (Applied Biosystems,
Foster City, CA, USA). Each sample was genotyped using GeneMapper
software v3.5 and verified by eye.
We genotyped all individuals at the selected SNP loci using HTS. The HTS
library preparation was completed using two rounds of PCR; multiplex
followed by barcoding (ligation) PCR. We first amplified the 117 SNP
loci using five separate multiplex PCRs for each sample (bird). Each
multiplex PCR included 17-25 primer pairs (SNP locus groups shown in
Supplementary Material S4). Multiplex PCR used the Qiagen Multiplex PCR
Plus Kit (Qiagen Inc., Toronto, ON, Canada). For each multiplex group,
we first made 10x primer pools containing all primers within that group
at equimolar concentration of 0.2uM. Each 7uL multiplex reaction
contained 3.5uL Multiplex PCR Plus Master mix, 0.7uL of the 10x primer
pool, 1.3uL ddH2O, and 1.5uL genomic DNA. The
amplification conditions were: 5 min at 95°C followed by 28 cycles of
30s at 95°C, 90s at 58°C and 30s at 72°C followed by 10 mins at 68°C. We
diluted the multiplexed PCR product 10-fold with ddH2O.
Next, PCR products from each of the five multiplex reactions were pooled
for each individual and cleaned using Sera-Mag Speed Beads (Cytiva,
Mississauga, ON, Canada) to remove unincorporated dNTPs, primers, primer
dimers and PCR buffers. We then ligated individual barcode sequences and
HTS adaptor sequences to the PCR products in a second (ligation)
short-cycle PCR. The 20uL PCR reaction included: 10x Taq buffer (20mM
Tris-HCl pH 8.0, 10mM KCl, 10mM
(NH4)2SO4; Bio Basic
Canada Inc., Markham, ON, Canada), 2mM MgSO4, 0.1mg/mL
bovine serum albumin (BSA), 200uM of each dNTP, 200nM of forward and
reverse primers, 0.5 U of Taq polymerase (Bio Basic Canada Inc.,
Markham, ON, Canada), and 10uL of pooled and cleaned multiplex PCR
product. The PCR conditions for the ligation PCR were: 94°C for 2 min,
followed by 6 cycles of 94°C for 30s, 60°C for 30s and 72°C for 60s,
followed by 72°C for 5 min. This second PCR ligated a “barcode”
sequence that allowed us to identify each sample for allocating sequence
data to specific individuals post-sequencing. The barcoded products were
pooled and gel-extracted using the GenCatch Gel Extraction Kit (Epoch
Life Science Inc., Sugar Land, TX, USA) as per manufacturer’s
instructions. Purified pooled product was analyzed on an Agilent 2100
Bioanalyzer using a High Sensitivity chip (Agilent Technologies Canada
Inc., Mississauga, ON, Canada) to verify the size and concentration of
the library amplicons. Finally, the library was diluted to approximately
60pM and sequenced using Ion PGM Hi-Q chemistry in an Ion Chef System
(Thermo Fisher Scientific Inc., Streetsville, ON, Canada). Specifically,
the library was sequenced using an Ion 318 Chip Kit with an Ion PGM
Sequencing 400 Kit (Thermo Fisher Scientific, Mississauga, ON).
SNP HTS
Bioinformatics
After HTS sequencing of the pooled SNP PCR amplicons, we used the FASTX
Toolkit (Gordon & Hannon, 2010) and its Barcode Splitter script to
demultiplex the sequences. We then trimmed off the sequencing adapters
and barcodes from all reads using CUTADAPT v1.11 (Martin, 2011) and
subsequently mapped the resulting PCR-amplified sequences to our
reference transcriptome using BWA v0.7.12 (Li & Durbin, 2009) to
identify the genes containing the amplified SNP regions. To genotype all
individuals at target SNP loci, we used FreeBayes (Garrison & Marth,
2012), a Bayesian genetic variant detector. Since FreeBayes detects many
other variants such as small multi-nucleotide polymorphisms (MNPs),
insertions and deletions (indels), composite insertions, and
substitutions, we discarded such variants using VCFtools (Danecek et
al., 2011). Next, we refined the VCF file through a filtration step
which excluded the SNP locus markers that were called in less than 30%
of individuals (16 out of 117 SNPs) and the individuals that were
missing more than 10% of their genotypes (2 out of 221 individuals).
Lastly, we only kept one SNP per amplicon (i.e., the original SNP used
to design the primers for that amplicon) for further analyses to avoid
any bias resulting from including multiple (linked) SNPs per amplicon.
As a result, we had 101 SNP genotypes across 219 individuals for further
analyses.
Population genetic
analyses
Testing for temporal
effects
Because we had individuals collected across multiple years for most of
our study populations, we first tested for temporal effects (i.e., a
year effect) on allele frequencies. We conducted separate Fisher’s exact
tests of allele frequency variation for the microsatellite marker data
for multi-year samples from Alert, Svalbard, Utqiagvik and Mitivik
Island using the genepop package (Rousset, 2008) in R v1.2.5 (R Core
Team 2016). Since p -values ranged from 0.08-0.50 for each
population, we concluded that there were no temporal effects, hence we
combined samples from multiple years for the Alert, Svalbard, Utqiagvik,
and Mitivik Island populations.
Testing for Alaskan
island population neutral divergence
The island populations of Alaska (Attu, Adak, and Pribilof islands) are
geographically clustered (Figure 1), allowing dispersal among the
islands possibly resulting in a single metapopulation. We thus tested
these three populations for neutral population divergence. Based on the
results of Fisher’s test, we combined Attu and Adak Island samples from
1999 forming the population ‘Aleutian Islands’ since there were no
significant differences in neutral allele frequencies (p = 0.14).
We retained Pribilof Islands individuals as a separate population for
further analyses as it had a significantly different (p< 0.00001) neutral allele frequency distribution from the
Aleutian Islands samples. These two Alaskan island populations combined
with the other four populations, resulted in a total of six populations
for downstream analyses (Table 2).
Population genetic
divergence
We assessed population differentiation using both neutral microsatellite
and functional SNP markers using pairwise Fisher’s exact test of allele
frequency variation in the genepop package (Rousset, 2008) in R. We also
estimated pairwise FST for both marker types
using GENODIVE (version 3.0) (Miermans, 2020). We corrected allp -values for multiple comparisons using the sequential Bonferroni
procedure (Rice, 1989) where necessary.
Neighbour-joining
cluster analyses
To visually assess the pattern of population genetic divergence for the
two marker types (microsatellite and SNP loci), we performed unrooted
neighbour-joining (NJ) cluster analyses with Cavalli-Sforza and Edward’s
(1967) chord distance (Dc ) using the ‘ape’
package (Paradis & Schliep, 2019) in R. Chord distance was used as it
is expected to provide better tree topology estimation for closely
related populations, although it may compromise branch length estimation
(Angers & Bernatchez, 1998). The percent support for branches was
estimated using bootstrapping, with replacement, among loci using 10,000
permutations in the ‘poppr’ package (Kamvar et al., 2014) in R.
Selection signatures
at candidate loci
To detect a signature of selection at functional SNP loci, it is
important to separate the effects of genetic drift from selection. For
this purpose, we used the microsatellite markers to estimate the neutral
allelic distribution; it is expected that both functional SNP loci and
microsatellites undergo change due to genetic drift and gene flow, but
only SNP loci are expected to be under selection due to potential local
habitat-specific environmental conditions.