Microsatellites- effects of microsatellite length
As expected the tissue sample had fairly consistent genotypes across all
amplifications, except in one instance where a 2 bp difference was
detected between the two PCR replicates. Interestingly, in this case
(HSU 8180 marker GS-4) both the bioinformatically combined and pooled
runs recovered a homozygous genotype. Based on read depth it is possible
the minor alleles (92 and 94 in replicates 1 and 2 respectively) were
sequencing errors or PCR stutter related to the high depth of coverage.
We recovered more frequent, and arguably more reliable genotypes across
all samples for shorter microsatellites. In one sample, LACM 95619 at
the GS-2 locus, it appeared that both the bioinformatically combined and
pooled run genotypes consisted of alleles from replicate 1 (82/82) and
replicate 3 (84/84). Unfortunately the second replicate failed to
produce an allele, so it is unknown from this study if both recovered
alleles across the two PCRs are accurate, or if a dropout event is
depicted in this instance. These genotypes were called from 4,235 raw
reads for GS-2 replicate 1 and 4,449 reads from replicate 3. The second
replicate recovered only 1,034 reads, and was very poor quality (only
659 passed standard prinseq quality filters, a mere 63.73% of the
reads), which explains the lack of resulting genotypes from that
replicate (Table 3).
The high quality museum specimens performed as well, and occasionally
better, than the tissue sample. Despite reliable performance and
genotyping success for the HQMS, there were still a handful of missing
allele calls ranging from GS-4 in sample UMMZ 79755, to the longest
marker GLSA-52, where both UMMZ 79755 and UMMZ 79760 lacked calls in one
replicate each. Overall, the high quality specimens worked remarkably
well across all loci, but often had more than two prominent sequences as
flagged by CHIIMP (see Table 4 for details). The resulting genotypes
were highly reliable, and only appeared to lack confirmation in GLSA-52,
the longest microsatellite evaluated here. Two of the three HQMS had
different calls between the individual replicates and the
bioinformatically combined and pooled runs. Interestingly, UMMZ 79760
recovered 1 bp separated genotypes, 250/251, which does not make
evolutionary sense for a dinucleotide microsatellite. Upon further
investigation, the two alleles in the pooled run were recovered due to
the following: allele 1 (251 bp) had an additional ‘CA’ repeat, but only
16 bp of the reverse primer, and allele 2 (250 bp) had one fewer ‘CA’
repeat and 17 bp of the reverse primer, resulting in a difference of 1
bp. The 255 bp alleles called had the same number of repeats as allele 1
but included the entire reverse primer sequence (20 bp), and the 253 bp
allele called had the same number of repeats as allele 2 but included
the entire reverse primer sequence.
The low quality samples could recover accurate genotypes, however much
more variation occurred in the quality of the data (see Figure 1) and as
such the reliability of the resulting genotypes requires stringent
evaluation. The length appeared to make a difference, even on the
shortest marker GS-2 four out of nine replicates did not recover a
genotype, and two of the remaining five did not match between
replicates. As the length of the microsatellites increased, the
generation and reliability of the microsatellite decreased. For the low
quality museum specimens, GS-2 had four missing genotypes, GS-4 had one
missing and five mismatches, GLSA-12 had eight missing genotypes and one
mismatch, GLSA-22 had seven missing and GLSA-52 was missing all nine
genotypes.