Data analysis methods
DNA metabarcoding is NGS platform-based. Correct extraction of sequences is crucial for successful source tracking. There are several NGS data process pipelines (such as OTU, DADA2, COTU and etc.) and each of them has its own advantages and disadvantages. OTU pipeline, the earliest one, groups reads at a certain similarity (usually 0.97) and creates OTUs using very short computing time. DADA2 and Unoise3 do not adopt a subjective similarity value. COTU method, a recently proposed strategy, updates the OTU method by elongating the consensus sequence to be created (Liu et al. 2021) at the cost of computing time. Although there are some comparative studies on the pipelines (Prodan et al., 2020; Xiong & Zhan, 2018), it is still too early to say which one is the best.
The other important issue concerning data analysis is how soil samples can be reliably tracked back to the original place based on OTUs and their abundances. SourceTracker (Knights et al., 2011) and FEAST (Shenhav et al., 2019) are two most popular software packages for allocating components in a microorganism community to potential sources and the latter was claimed to be quicker and more accurate.
In this study, we tested source tracking accuracies of four kinds of data sets (four combinations between inclusion/exclusion of singletons and total/annotated OTUs) using FEAST. The results are nearly the same, indicating the high power of FEAST and reliability of the conclusion.