Abstract
DNA barcodes are standardized sequences that range between 400-800 bp,
vary at different taxonomic levels, and make it possible to identify
individuals of species that have been previously assigned taxonomically.
Several barcodes have been identified in different groups in the tree of
life. However, there are groups that lack an accurate DNA marker, and
even more so, accurate strategies that enable verification of their
taxonomic affiliation. Several DNA barcodes have been postulated for
plants, nonetheless, their classification potential has not been
evaluated for metabarcoding, and as a result, it would appear as no one
of them excels above the others in this area. One tool that has recently
gained traction is Naïve Bayesian Classifiers; this type of classifier
is based on the independence of attributes and the allocation of
categories in each context. The present study aims at evaluating the
classification power of several plant genetic markers that have been
proposed as barcodes (trnL, rpoB, rbcL, matK, psbA-trnH and
psbK) using a Naïve Bayesian Classifier, in order to determine
the markers with higher performance at different taxonomic levels for
metabarcoding analysis and to identify problematic genera at the time of
species assignment. We propose matK and trnL as potential
candidates up to the genus assignment. Some problematic genera
(Aegilops, Gueldenstaedtia, Helianthus, Oryza, Shorea,
Thysananthus and Triticum) within certain families in a sample
could lead to misclassification no matter which marker is used. Finally,
we propose recommendations when performing taxonomic identification
analysis of plants in samples with multiple individuals.