BOLD datasets acquisition
Access was permitted to analyse COI barcoding data deriving from a BOLD screening project totaling 184,585 arthropod specimens from 21 countries and collected between 2010 and 2014. COI sequences provided by BOLD were generally derived from templates created from somatic tissues (legs are often used in order to retain most of the specimen for further analyses if necessary), but also rarely included abdominal tissues. The first dataset made available included 3,817 sequences deemed as contaminant sequences, defined as not matching initial morphotaxa assignment. The second dataset included 55,366 specimens judged to not contain non-target amplicons ([dataset] Zakharov, Ratnasingham, deWaard & Smith, 2020). A remaining 125,402 specimens were not made available, and the 55,366 subsample was used as a representative sample from which the contaminants had originated (Figure 1).