Consensus Quality
Here we present NGSpeciesID, an easy-to-use, one-software solution for
the generation of high-quality consensus sequences for the long-read
sequencing technologies from ONT and PacBio. We compared NGSpeciesID
against results obtained with Mothur + Consension and ONTrack. In
general, all three software solutions produced consensus sequences of a
very high quality, reaching 99-100% accuracy in almost all cases. We
show that NGSpeciesID performs comparably to the other tools. Throughout
all comparisons, we see that consensus sequences based on ONT data
polished with Racon usually show lower percent similarities to the
Sanger sequence than consensus sequences polished with Medaka.
NGSpeciesID carries out 2 rounds of Racon polishing by default.
Increasing or decreasing the number of rounds might increase the
consensus quality. We chose Medaka as the default error corrector in
NGSpeciesID as it includes up to date error models. We did not include
an option to use Nanopolish in NGSpeciesID, which is used in ONTrack, as
this tool requires fast5 files, which are often not available for
published Oxford Nanopore data. Furthermore, it requires preprocessing
to generate the appropriate header structure in the corresponding fastq
files, which makes it much more time consuming to use.
As the generation of consensus sequences for DNA barcoding takes only a
few seconds for each sample (depending on the number of reads), we did
not compare run times between the different pipelines.