Microsatellites- effects of microsatellite length
As expected the tissue sample had fairly consistent genotypes across all amplifications, except in one instance where a 2 bp difference was detected between the two PCR replicates. Interestingly, in this case (HSU 8180 marker GS-4) both the bioinformatically combined and pooled runs recovered a homozygous genotype. Based on read depth it is possible the minor alleles (92 and 94 in replicates 1 and 2 respectively) were sequencing errors or PCR stutter related to the high depth of coverage.
We recovered more frequent, and arguably more reliable genotypes across all samples for shorter microsatellites. In one sample, LACM 95619 at the GS-2 locus, it appeared that both the bioinformatically combined and pooled run genotypes consisted of alleles from replicate 1 (82/82) and replicate 3 (84/84). Unfortunately the second replicate failed to produce an allele, so it is unknown from this study if both recovered alleles across the two PCRs are accurate, or if a dropout event is depicted in this instance. These genotypes were called from 4,235 raw reads for GS-2 replicate 1 and 4,449 reads from replicate 3. The second replicate recovered only 1,034 reads, and was very poor quality (only 659 passed standard prinseq quality filters, a mere 63.73% of the reads), which explains the lack of resulting genotypes from that replicate (Table 3).
The high quality museum specimens performed as well, and occasionally better, than the tissue sample. Despite reliable performance and genotyping success for the HQMS, there were still a handful of missing allele calls ranging from GS-4 in sample UMMZ 79755, to the longest marker GLSA-52, where both UMMZ 79755 and UMMZ 79760 lacked calls in one replicate each. Overall, the high quality specimens worked remarkably well across all loci, but often had more than two prominent sequences as flagged by CHIIMP (see Table 4 for details). The resulting genotypes were highly reliable, and only appeared to lack confirmation in GLSA-52, the longest microsatellite evaluated here. Two of the three HQMS had different calls between the individual replicates and the bioinformatically combined and pooled runs. Interestingly, UMMZ 79760 recovered 1 bp separated genotypes, 250/251, which does not make evolutionary sense for a dinucleotide microsatellite. Upon further investigation, the two alleles in the pooled run were recovered due to the following: allele 1 (251 bp) had an additional ‘CA’ repeat, but only 16 bp of the reverse primer, and allele 2 (250 bp) had one fewer ‘CA’ repeat and 17 bp of the reverse primer, resulting in a difference of 1 bp. The 255 bp alleles called had the same number of repeats as allele 1 but included the entire reverse primer sequence (20 bp), and the 253 bp allele called had the same number of repeats as allele 2 but included the entire reverse primer sequence.
The low quality samples could recover accurate genotypes, however much more variation occurred in the quality of the data (see Figure 1) and as such the reliability of the resulting genotypes requires stringent evaluation. The length appeared to make a difference, even on the shortest marker GS-2 four out of nine replicates did not recover a genotype, and two of the remaining five did not match between replicates. As the length of the microsatellites increased, the generation and reliability of the microsatellite decreased. For the low quality museum specimens, GS-2 had four missing genotypes, GS-4 had one missing and five mismatches, GLSA-12 had eight missing genotypes and one mismatch, GLSA-22 had seven missing and GLSA-52 was missing all nine genotypes.