Sampling and genotyping
We analyzed a total of 304 Idaho ground squirrel samples. We conducted
all sampling following standard biological and ethical requirements
under the auspices of University of Idaho Animal Care and Use Committee
Protocol #2015-53. We preserved samples consisting of buccal swabs in
Qiagen ATL buffer and stored them at room temperature or
4oC until DNA extraction. For NIDGS, we obtained 232
samples from 17 populations in Adams County in central Idaho (Figure
1B), which represents approximately 60% of the known populations. For
SIDGS, we collected 72 samples from 5 populations in Washington, Payette
and Gem Counties (Figure 1B), which is only a small proportion of the
> 100 sites identified during surveys in 2000 (CCAA 2002).
Details on the location and number of samples collected per population
are shown in Table 1. Samples were extracted using the Qiagen Blood and
Tissue DNA extraction kit (Qiagen, Inc.), and quantified DNA using
fluorometric quantitation with a Qubit™ double-stranded DNA High
Sensitivity Assay Kit. We used genetic samples for Restriction Site
Associated DNA Sequencing (RADseq) (Murphy et al. 2007; Bairdet al. 2008; Andrews et al. 2016). We prepared libraries
following Ali et al. (2016) using the sbfI restriction
enzyme, but excluding the last part of that protocol relating to
targeted bait capture, and instead only using the new RADseq protocol
with biotinylated adapters. We built four libraries, each comprised of
~80 individually barcoded samples, and sequenced each
library on one lane of Illumina HiSeq 4000 at the University of Oregon
Genomics & Cell Characterization Core Facility (GC3F), with 150 base
pairs (bp) paired-end reads. We conducted all bioinformatic analyses
using the Institute for Bioinformatics and Evolutionary Studies (IBEST)
Computational Resources Core servers at the University of Idaho. For SNP
calling, we used the software stacks version 2.2 (Catchenet al. 2013) to demultiplex and remove PCR duplicates . We used
the thirteen-lined ground squirrel (Ictidomys tridecemlineatus )
genome (GCA_000236235.1) as a reference to align our data using the
software bwa version 0.7.17 (Li 2013). From these alignments, we
used the software SAMtools version 1.9 (Li et al. 2009).
We conducted our analyses at both the interspecific and intraspecific
levels in order to determine putative adaptive differences between the
two sister species, but also searching for patterns of population
structure and local adaptation among populations within species. To do
so, we defined a first dataset consisting of all 304 IDGS, and then
separated the samples by species, ‘NIDGS’ (232 samples) and ‘SIDGS’ (72
samples) datasets. We called SNPs for the ‘IDGS’ dataset using the
multisample SNP caller (mpileup ) implemented in SAMtools ,
and subsequently used VCFTools version 0.1.14 (Danecek et al.2011) to exclude individuals with ≥50% missing data, keeping only
biallelic SNPs that had <50% missing data, were located
>10,000 base pairs apart, had >2% minor
allele frequency and >3 reads. To produce the species
specific datasets, we separated the samples by species (232 NIDGS and 72
SIDGS) and repeated the above filtering with a MAF value of 3%. We then
used part of the R script available from Wright et al. (2019) to
exclude loci with heterozygosity >70% and with greater
than a 3:1 ratio of mean read depth for the reference allele versus the
alternative allele, across individuals with these alleles, to avoid loci
with significant reference bias. Finally, we used the functions from thewhoa R package to estimate heterozygote miscalls rates and
excluded loci deviating from more than one standard deviation from the
mean
(http://150.185.130.98/rcran/web/packages/whoa/index.html).
For all datasets, we compared the percentage of missing data per
individual (–missing-indv) with mean read depth (–depth) and
heterozygosity (–het) obtained using VCFTools version 0.1.14 (Daneceket al. 2011).