Validation using purified DNA mixed in known proportions
We first evaluated the accuracy of MAUI-seq by profiling DNA mixtures
with known strain DNA ratios. DNA was extracted from two Rltstrains differing by a minimum of 3bp in each of their recA ,rpoB , nodA, and nodD amplicon sequences, and the
extracted DNA was mixed in different ratios (SupplementaryTable S1 ). After amplification and sequencing, assembled reads
were assigned to their target gene and analysed using MAUI-seq and two
programs frequently used for de-noising of amplicon sequencing data,
DADA2 and UNOISE3 . Since rare sequences have a high error rate, we
discarded (for each of the three methods) sequences that fell below a
threshold frequency of 0.1% of accepted sequences. The observed and
expected strain ratios were highly correlated for all four genes across
the three analysis methods, and we found that the performances of the
proofreading (Phusion) and non-proofreading (Platinum) polymerases were
gene-dependent, which could be due to differences in amplification
efficiency for the four templates (Table 1 andSupplementary Figures S1-S4 ). On average, MAUI-seq detected
between 98.5% and 100% true sequences exactly matching those of the
two strains in the mixture, while DADA2 ranged from 89.7% to 100%, and
UNOISE3 from 79.8% to 100% (Table 1 ). The better performance
of MAUI-seq was due to more effective elimination of chimeras, which
were especially abundant when the PCR reaction was carried out using the
Platinum non-proofreading polymerase (Table 1 andSupplementary Figures S1-S4 ). For the proofreading polymerase,
DADA2 detected 100% true sequences for all four genes, whereas MAUI-seq
detected 99.03% for nodA , failing to eliminate three rare
sequences that did not have sufficient secondary counts. This suggests
that DADA2 can perform equally well or even slightly better than
MAUI-seq, when a proofreading polymerase is used to amplify DNA from a
simple, two-component mix. The prevalence of secondary sequences varied
with gene and polymerase: the secondary/primary ratio for accepted
sequences was 0.0322 for rpoB using Phusion, but just 0.0002 fornodD using Platinum. When the ratio was very low, there were
insufficient secondary counts for MAUI-seq to eliminate erroneous
sequences effectively.