Future directions for MAUI-seq
HTAS is a valuable and widely-used approach for the study of microbial
community diversity, but handling erroneous sequences introduced by the
amplification and sequencing procedures has always been challenging. The
use of UMIs allows MAUI-seq to greatly reduce the incidence of errors
through two mechanisms. Firstly, the requirement that a UMI is
associated with at least two identical reads eliminates many rare
sequences that are predominantly erroneous. Secondly, sequences that are
frequently generated as errors can be identified and removed because
they occur unexpectedly often as minor components associated with UMIs
that are assigned to more abundant sequences. These mechanisms are
independent of any reference database and can recognise and retain
genuine alleles that differ by a single nucleotide or match a potential
chimera. This makes MAUI-seq particularly suited to studies of
intraspecific variation, where the range of sequence divergence may be
limited and not fully known in advance. However, the efficient
elimination of erroneous sequences is also important in community
studies such as those based on widely-used 16S primers, and MAUI-seq
should be readily adaptable to this field. The analysis pipeline is very
fast because no sequence alignment or database searching is involved;
only the accepted final sequences would need to be characterised by
comparison to a reference database.
Most HTAS studies report the relative proportions of the taxa in a
community, but it would sometimes be valuable to estimate the absolute
abundance of the microbes in the environmental sample. UMIs can
potentially provide such information, if the initial template copying is
carefully controlled so that the total number of distinct UMIs reflects
the number of templates . While this would necessitate some additional
steps at the start of the experimental protocol, it should still be
possible to analyse the resulting sequences using the error-removal
approaches provided by MAUI-seq. Alternatively, absolute abundance can
be estimated by adding a spike of a known quantity of a recognisable
target sequence to the sample before processing .
The addition of a UMI shortens the maximum length of target sequence
that can be read, and the counting of UMIs rather than reads requires a
higher depth of sequencing, but these limitations are increasingly
unimportant as improvements in sequencing technology lead to increasing
length, enabling long-read amplicon sequencing , and numbers of reads.
As implemented in MAUI-seq, UMIs are very effective in reducing the
errors inherent in HTAS, and have the potential to improve the quality
of any amplicon-based study of diversity. There are several parameters
(minimum difference between primary and secondary reads of a UMI, ratio
of secondary to primary reads of a sequence, minimum relative abundance)
that are user-specified and can be adjusted to suit each study. In
principle, it should be possible to optimize these using a statistical
model of mutational errors, like that implemented in DADA2 and of
chimera formation, which is not modelled in detail by DADA2. The UMIs
provide an additional source of information to parameterize the model,
linking sequences that have a common origin. Such a model would be
complex, however, and parameterizing and testing it would need a dataset
that was optimized for the purpose. At the same time, it would also be
interesting to explore the use of UMIs at both ends of the amplicon,
which would provide an additional means to identify and eliminate
chimeras .