Data analysis methods
DNA metabarcoding is NGS platform-based. Correct extraction of sequences
is crucial for successful source tracking. There are several NGS data
process pipelines (such as OTU, DADA2, COTU and etc.) and each of them
has its own advantages and disadvantages. OTU pipeline, the earliest
one, groups reads at a certain similarity (usually 0.97) and creates
OTUs using very short computing time. DADA2 and Unoise3 do not adopt a
subjective similarity value. COTU method, a recently proposed strategy,
updates the OTU method by elongating the consensus sequence to be
created (Liu et al. 2021) at the cost of computing time. Although there
are some comparative studies on the pipelines (Prodan et al., 2020;
Xiong & Zhan, 2018), it is still too early to say which one is the
best.
The other important issue concerning data analysis is how soil samples
can be reliably tracked back to the original place based on OTUs and
their abundances. SourceTracker (Knights et al., 2011) and FEAST
(Shenhav et al., 2019) are two most popular software packages for
allocating components in a microorganism community to potential sources
and the latter was claimed to be quicker and more accurate.
In this study, we tested source tracking accuracies of four kinds of
data sets (four combinations between inclusion/exclusion of singletons
and total/annotated OTUs) using FEAST. The results are nearly the same,
indicating the high power of FEAST and reliability of the conclusion.