Conclusions
Some potential advantages of incorporating UMIs in amplicon diversity studies have been explored previously, but here we propose a new way to use the extra information that they provide. Error processes lead to more than one sequence being associated with the same UMI, and this can be used to identify erroneous sequences regardless of their relative abundance or their relationship to other sequences in the sample. The method is experimentally and computationally straightforward, and we demonstrate its effectiveness using known strain mixtures and real environmental samples. It allows decontamination of amplicon sequence data by flagging chimeras and other errors, and can readily be adapted to any target gene of interest in microbiome studies.