Microsatellites- effects of sample quality
When prinseq was run on all reads from individually library prepared replicates a general pattern emerged where PCR success was predictive of quality. The mean quality across all sample types was 85.6%, with a range from 57.43% - 99.48% (MVZ 5211 GS-2 replicate two and HSU 1836 GLSA-52 replicate one represent the lowest and highest). The median was 92.6% and mode was 96.82%. If sample types were separated the mean quality was as follows: 95.99% (SE ±0.99%), 95.06% (SE ±0.68%), and 73.84% (SE ±2.10%) for tissue, HQMS and LQMS respectively (Table 5). All aforementioned and additional descriptive metrics are shown in Appendix 2. The ANOVA was significant (P= <0.001) and the regression resulted in an R2 of 0.44 which was also highly significant (P=<0.001), indicating our assessed quality from gel electrophoresis was predictive of genotyping success.
The CHIIMP genotypes were accurate for the single tissue sample, especially if the PCR replicates were pooled and genotyped together. The reads per replicate and percentage of good reads (as determined by standard prinseq quality filtration) are reported in Table 3. All recovered genotypes are summarized in Table 4. In the GLSA-52 locus the pooled run recovered a second allele 251 from HSU 8180 despite none of the other genotypes recovering that allele. In the pooled run HSU 8180 recovered a total of 79,417 reads, across all microsatellite loci and complete cytochrome b . The 251 allele was recovered in the pooled dataset with a frequency of 5.3% whereas the 257 allele was recovered at 17.6%, more than three times the frequency of the 251 allele. When all other replicates of this sample were evaluated only the 257 allele was recovered, and with rates ranging from 12.9-17.6% (see details in Appendix 4).
Mismatched alleles were recovered most frequently in low quality samples which routinely appeared to fail PCR across numerous replicates. Mismatches were often associated with one or more of the following: PCR stutter sequences, PCR artifacts and more than two prominent sequences as identified by the CHIIMP pipeline (Table 4). Individual samples did not appear to recover specific CHIIMP flags across all replicates, neither did specific microsatellites, however the locus GS-2 recovered frequent flags for all three metrics (Table 4).
The HQMS samples had routinely high quality sequences as determined by prinseq metrics. Only a single PCR replica from UMMZ 79755 had less than 85% of sequences pass quality metrics. All other replicates were over 85%, and most recovered over 95% of sequences passing quality filtration. Interestingly, the LQMS samples had high variation between PCR replicates performed here (Figure 1). The average for each of the LQMS samples were 81.3, 69.6, and 70.6% with a high degree of variation between replicates. For example, LACM 95619 ranged from 67.7-92.05% passing quality filters for GS-4. In this instance, across the three replicates three completely different sets of alleles were recovered providing no confidence in those genotypes despite one replicate recovering 92.05% high quality sequences.