Genome sequencing, assembly, and post-processing
We generated 11 whole-genome sequences representing both New World
marten species, including individuals collected in both known hybrid
zones (Kuiu [KUI] and the northern Rocky Mountains [MTX]) and
multiple translocated islands (Prince of Wales Island, POW; Chichagof
Island, CHI), with an Old World sable (Martes zibellina ) included
as an outgroup (Table 1). Sequences were generated on an Illumina HiSeq
X through the Beijing Genomics Institute (BGI Americas, Philadelphia,
PA, USA) and NextSeq 500 through the Molecular Biology Facility at the
University of New Mexico. Sampling was based on previous genetic (Dawsonet al. 2017; Colella et al. 2018a) and morphological
analyses (Colella et al. 2018b) that helped define species limits
and refine hybrid zone locations through the identification of mixed
mitochondrial and nuclear haplotypes. Subsamples of liver tissue were
loaned from the University of New Mexico’s Museum of Southwestern
Biology (MSB) and the Burke Museum at the University of Washington
(UWBM). DNA extractions followed a DNeasy Blood and Tissue Kit (Qiagen,
Venlo, The Netherlands) protocol. Our assembly pipeline followed Colellaet al. (2018c). Read quality was examined using FastQC (Andrews
2010) and adapter sequences and sex chromosomes removed by excluding
those scaffolds from the reference (Trimmomatic v0.33; Bolger et
al. 2014). The Burrows-Wheeler aligner (BWA, Li & Durbin 2010) was
used to map reads to the domestic ferret genome (Mustela putorius
furo ; Peng et al. 2014) and an additional BWA iteration
extracted mitochondrial genomes using the same reference. Final depth of
coverage ranged from 19 to 30X (Table 1). PCR duplicates were removed
using Picard v1.9 (MarkDuplicates;
http://broadinstitute.github.io/picard/) and nuclear and mitochondrial
consensus sequences called using SAMtools (mpileup; Li et al.2009). Single nucleotide polymorphisms (SNPs) were called with the
Genomic Analysis Toolkit (GATK, Haplotypecaller; McKenna et al.2010) for all North American marten and again against the M.
zibellina outgroup. SNPs were filtered (Supplemental Information1) by
minimum depth (minDP = 2, set to 1/3rd the coverage of
our lowest coverage sample, as recommended for PSMC analyses; Li &
Durbin 2011), genotype quality (minGQ = 30), minimum minor allele
frequency (MAF = 0.1), and scaffold size (1Mb). Private alleles and
indels were removed using VCFtools (Danecek et al. 2011). A MAF
of 0.1 removed singletons (e.g., individual-specific, rare mutations),
which are not informative about allelic overlap among populations, to
reduced potential sequencing errors more common in lower coverage
genomes. Format conversions (vcf, ped, bed) were conducted in PLINK
(Purcell et al. 2007). Missing data were removed
(–max-missing, VCFtools) based on analysis specifications. Variants
were spaced (1 per 100bp window) to account for linkage disequilibrium
and sorted into 46 ‘pseudo-chromosomes’ to enable the application of
human-specific analyses to a non-model system with only 38 chromosomes
using custom python scripts available online at
https://github.com/jpcolella/.