Using taxonomy to assemble barcodes
Because the maximally abundant sequences from 20-30 % of specimens were
not from the expected taxa according to BLAST, we developed a
taxonomy-weighted barcode-assembly approach. For each specimen, we
considered the most abundant FC and/or BR sequences with the expected
taxonomic identifications—prioritising the lowest identifiable
taxonomic rank in each case—to be correct and considered merged FC and
BR sequences to be correct barcodes only if the contributing sequences
had 100 % identity across the expected overlap length. This approach
typically identified correct FC and BR sequences among the 20 most
abundant sequences per specimen, but in a small number of cases, the
correct sequences were identified at abundance ranks between 21 and 101.
The ability to examine multiple sequences for correct identity is a key
advantage of this process over Sanger sequencing, in which only a single
sequence per specimen can typically be examined. This simple yet
effective filtering approach greatly enhanced successful barcode
recovery and provided evidence against relying solely on sequence
abundances to select barcode sequences. Directly leveraginga-priori taxonomic data from validated specimens allowed accurate
identification of non-target contaminant sequences, further stressing
the value that taxonomically validated specimens can confer towards
barcode generation. Similarly, taxonomic information was considered
important for confirming the identity of insect pests detected in bulk
trap catches by multi-locus metabarcoding [19].