Álvaro Gaytán

and 5 more

DNA barcoding identification needs a good characterization of intra-specific genetic divergence to establish the limits between species. Yet, the number of barcodes per species is many times low and geographically restricted. A poor coverage of the species distribution range may hamper identification, especially when undersampled areas host genetically distinct lineages. If so, the genetic distance between some query sequences and reference barcodes may exceed the maximum intra-specific threshold for unequivocal species assignation. Taking a group of Quercus herbivores (moths) in Europe as model system, we found that the number of DNA barcodes from southern Europe is proportionally very low in the Barcoding of Life Data Systems (BOLD). This geographical bias complicates the identification of southern query sequences, due to their high intra-specific genetic distance with respect to barcodes from higher latitudes. Pairwise intra-specific genetic divergence increased along with spatial distance, but was higher when at least one of the sampling sites was in southern Europe. Accordingly, GMYC (General Mixed Yule Coalescent) single threshold model retrieved clusters constituted exclusively by Iberian haplotypes, some of which could correspond to cryptic species. The number of putative species retrieved was more reliable than that of multiple threshold GMYC but very similar to results from ABGD and jMOTU. Our results support GMYC as a key resource for species delimitation within poorly inventoried biogeographic regions in Europe, where historical factors (e.g. glaciations) have promoted genetic diversity and singularity. Future European DNA barcoding initiatives should be preferentially performed along latitudinal gradients, with special focus on southern peninsulas.