Conclusions
Genotyping by synthesis is an effective way to generate affordable genotype results for degraded specimens when stringent protocols and deep sequencing is performed. Our costs were under ~$15 per sample, details provided in Appendix 6. This was very comparable to other GBS studies (Darby et al., 2016), and notably does not require the initial investment in fluorescently labelled primers, but does require sequencing adapters, as well as the ability to fill a sequencing run. We also only performed singleplex PCR, and if time was spent on designing multiplex PCRs the cost of taq could be significantly reduced. If for example, two microsatellites were multiplexed the cost per sample would be reduced to $13.30/sample, and if three microsatellites were pooled the overall cost would be reduced to $12.81/sample.
Several bioinformatic pipelines have already been developed to generate microsatellite genotypes from HTS data (Barbian et al., 2018; De Barba et al., 2017; Pimentel et al., 2018; Tibihika et al., 2019), and have screened a variety of starting template types including tissue samples, hair and fecal samples. This is the first time GBS methods (employing an existing pipeline developed for fecal samples) has been applied to evaluate the error rates from museum specimen derived DNA samples. Our results show that when reliable amplification occurs, robust genotyping can be recovered from museum specimens, especially samples deemed HQMS. The rates of agreement between genotypes were nearly identical between the HQMS and our tissue sample. For low quality samples repeated PCR is necessary, and does not completely eliminate the opportunity for a false genotype to be included in a dataset. This, however, is also known from CE fragment size analysis, and many studies have reported shifted alleles of the same PCR products on different runs of an automated capillary sequencer, or with a different size standard (Ellis et al., 2011; Haberl & Tautz, 1999). We believe that our allele calls for the HQMS are robust and contribute valuable data points to studies where historical data is not available. This study provides best practices for the genotyping of degraded source samples.
Previous studies have shown that the type of museum specimen sample obtained (bone, skin, hair, cartilage, nail) may have more of an effect than age on recovery of DNA (Hawkins, Hofman, et al., 2016; McDonough et al., 2018), yet here, based on our limited sample size the worst performing samples for microsatellites were in fact the oldest (1905-1919 Table 1). Hawkins, Hofman, et al., (2016) only evaluated mitochondrial DNA recovery and from in-solution hybridization and McDonough et al., (2018) recovered variable concentrations of mtDNA versus nDNA, with mtDNA unexpectedly recovering approximately an order of magnitude more sequencing depth than nDNA. It is worth noting however, that many samples from our expanded dataset (Yuan, 2020) were as old as the LQMS here, yet reliably amplified for the same microsatellite loci. Due to these factors we refrain from further speculation on the patterns of degradation associated with age for nuclear DNA content in museum specimens.
The LQMS genotypes recovered require fine scale evaluation to ensure accuracy and repeatability for downstream analyses, as inaccurate allele calls can affect population genetic inferences. Variable genotypes were much more prevalent in the low quality samples (16 instances in the LQMS versus only two in the HQMS). These variable genotypes may not be specifically due to allelic dropout which is commonly seen in fecal samples (Piggott, Bellemain, Taberlet, & Taylor, 2004; Regnaut et al., 2006), since alleles which appear to be outside the expected bin sizes were recovered (see GS-4 for the LQMS), and only rarely did potential allelic dropout appear (see GS-2 for LACM 95619). Further optimization of the CHIIMP pipeline may allow for elimination of those genotypes with the size buffer setting. Additionally, all samples recovered reliable mtDNA signatures, where many (particularly the LQMS) lacked nDNA at many loci. One sample (MVZ 5211) had incredibly high cytochrome bcoverage, yet no reliable nDNA genotypes.
The integration of using microsatellite markers on degraded samples and using the improved resolution from GBS will allow further comparison to the plethora of published studies on microsatellites. Museum specimens are very important to utilize as they give both temporal perspective and representation of rare species. But, appropriate QC measures need to be undertaken to ensure accuracy of recovered genotypes. We believe that this data illuminates the possibility of reliably incorporating microsatellite genotypes from specimens from the early 20th century museum collection in combination with modern surveys to evaluate genetic shifts and population genomics through space and time.