Consensus Quality
Here we present NGSpeciesID, an easy-to-use, one-software solution for the generation of high-quality consensus sequences for the long-read sequencing technologies from ONT and PacBio. We compared NGSpeciesID against results obtained with Mothur + Consension and ONTrack. In general, all three software solutions produced consensus sequences of a very high quality, reaching 99-100% accuracy in almost all cases. We show that NGSpeciesID performs comparably to the other tools. Throughout all comparisons, we see that consensus sequences based on ONT data polished with Racon usually show lower percent similarities to the Sanger sequence than consensus sequences polished with Medaka. NGSpeciesID carries out 2 rounds of Racon polishing by default. Increasing or decreasing the number of rounds might increase the consensus quality. We chose Medaka as the default error corrector in NGSpeciesID as it includes up to date error models. We did not include an option to use Nanopolish in NGSpeciesID, which is used in ONTrack, as this tool requires fast5 files, which are often not available for published Oxford Nanopore data. Furthermore, it requires preprocessing to generate the appropriate header structure in the corresponding fastq files, which makes it much more time consuming to use.
As the generation of consensus sequences for DNA barcoding takes only a few seconds for each sample (depending on the number of reads), we did not compare run times between the different pipelines.