Conclusions
Genotyping by synthesis is an effective way to generate affordable
genotype results for degraded specimens when stringent protocols and
deep sequencing is performed. Our costs were under ~$15
per sample, details provided in Appendix 6. This was very comparable to
other GBS studies (Darby et al., 2016), and notably does not require the
initial investment in fluorescently labelled primers, but does require
sequencing adapters, as well as the ability to fill a sequencing run. We
also only performed singleplex PCR, and if time was spent on designing
multiplex PCRs the cost of taq could be significantly reduced. If for
example, two microsatellites were multiplexed the cost per sample would
be reduced to $13.30/sample, and if three microsatellites were pooled
the overall cost would be reduced to $12.81/sample.
Several bioinformatic pipelines have already been developed to generate
microsatellite genotypes from HTS data (Barbian et al., 2018; De Barba
et al., 2017; Pimentel et al., 2018; Tibihika et al., 2019), and have
screened a variety of starting template types including tissue samples,
hair and fecal samples. This is the first time GBS methods (employing an
existing pipeline developed for fecal samples) has been applied to
evaluate the error rates from museum specimen derived DNA samples. Our
results show that when reliable amplification occurs, robust genotyping
can be recovered from museum specimens, especially samples deemed HQMS.
The rates of agreement between genotypes were nearly identical between
the HQMS and our tissue sample. For low quality samples repeated PCR is
necessary, and does not completely eliminate the opportunity for a false
genotype to be included in a dataset. This, however, is also known from
CE fragment size analysis, and many studies have reported shifted
alleles of the same PCR products on different runs of an automated
capillary sequencer, or with a different size standard (Ellis et al.,
2011; Haberl & Tautz, 1999). We believe that our allele calls for the
HQMS are robust and contribute valuable data points to studies where
historical data is not available. This study provides best practices for
the genotyping of degraded source samples.
Previous studies have shown that the type of museum specimen sample
obtained (bone, skin, hair, cartilage, nail) may have more of an effect
than age on recovery of DNA (Hawkins, Hofman, et al., 2016; McDonough et
al., 2018), yet here, based on our limited sample size the worst
performing samples for microsatellites were in fact the oldest
(1905-1919 Table 1). Hawkins, Hofman, et al., (2016) only evaluated
mitochondrial DNA recovery and from in-solution hybridization and
McDonough et al., (2018) recovered variable concentrations of mtDNA
versus nDNA, with mtDNA unexpectedly recovering approximately an order
of magnitude more sequencing depth than nDNA. It is worth noting
however, that many samples from our expanded dataset (Yuan, 2020) were
as old as the LQMS here, yet reliably amplified for the same
microsatellite loci. Due to these factors we refrain from further
speculation on the patterns of degradation associated with age for
nuclear DNA content in museum specimens.
The LQMS genotypes recovered require fine scale evaluation to ensure
accuracy and repeatability for downstream analyses, as inaccurate allele
calls can affect population genetic inferences. Variable genotypes were
much more prevalent in the low quality samples (16 instances in the LQMS
versus only two in the HQMS). These variable genotypes may not be
specifically due to allelic dropout which is commonly seen in fecal
samples (Piggott, Bellemain, Taberlet, & Taylor, 2004; Regnaut et al.,
2006), since alleles which appear to be outside the expected bin sizes
were recovered (see GS-4 for the LQMS), and only rarely did potential
allelic dropout appear (see GS-2 for LACM 95619). Further optimization
of the CHIIMP pipeline may allow for elimination of those genotypes with
the size buffer setting. Additionally, all samples recovered reliable
mtDNA signatures, where many (particularly the LQMS) lacked nDNA at many
loci. One sample (MVZ 5211) had incredibly high cytochrome bcoverage, yet no reliable nDNA genotypes.
The integration of using microsatellite markers on degraded samples and
using the improved resolution from GBS will allow further comparison to
the plethora of published studies on microsatellites. Museum specimens
are very important to utilize as they give both temporal perspective and
representation of rare species. But, appropriate QC measures need to be
undertaken to ensure accuracy of recovered genotypes. We believe that
this data illuminates the possibility of reliably incorporating
microsatellite genotypes from specimens from the early 20th century
museum collection in combination with modern surveys to evaluate genetic
shifts and population genomics through space and time.