Materials and methods
The whole study procedure includes five parts: soil DNA acquisition,
amplicon preparation, amplicon sequencing, NGS data processing and
suspect’s mud tracking (Fig. S1).
Soil DNA acquisition
A total of 12 soil samples were collected, including one mud soil sample
collected from the criminal suspect’s pants, three soil samples from the
center of crime scene and eight soil samples surrounding crime scene
(Table 1, Fig. 1).
Approximately 25 mg of fully mixed soil of each sample was ground into
powder in a grinder mill (MM400, Retsch GmbH, Germany) equipped with a
zirconium magnetic bead at 29 Hz for two minutes at 30-second intervals
to minimize DNA damage. Total soil DNA was extracted using the mCTAB
method (Li et al., 2013). DNA was resuspended in 100 μL of TE buffer,
visually checked on 1.5% agarose gels, and quantified on a
Nanodrop2000c sectrophotometer (Thermo Fisher Scientific Inc., USA).
Amplicon preparation
Since this murder case happened in a small canal, we chose rbcLgene of diatom as our DNA barcode. The primer pair BacirbcL2f and
BacirbcL2r (Liu et al., 2020a) was used for this case. DNA fragments
from the same sample were labeled with a unique DNA oligo by PCR (Table
2, Fig. 2). A unique eight-nucleotide oligo for each sample was attached
to the 5’ end of both forward and reverse primers. PCR with a 10 μL
mixture was conducted on Eppendorf instrument Mastercycler proS
following Dong et al. (2015). The PCR products were checked by
electrophoresis with a 1.5% agarose gel containing ethidium bromide
under ultraviolet transilluminator.
The DNA-labelled PCR products were mixed, purified using a purification
kit (Aidlab Biotechnologies Co., Ltd, China) on a 2% agarose gel, and
quantified on a Nanodrop2000c spectrophotometer.
Amplicon sequencing
A sequencing library of the final PCR mixture was constructed for Ion
Torrent platform using NEBNext® Fast DNA Library Prep Set for Ion
Torrent (New England BioLabs, USA) and the library was sequenced at
Maize Research Center, Beijing Academy of Agriculture and Forestry
Sciences on Ion Torrent S5xl Chip400.
NGS data processing
Quality control and demultiplexing. NGS data quality control
was carried out using the NGS QC toolkit with the default parameters
(Ravi et al., 2012). After quality control, the NGS data from Ion
torrent S5xl were demultiplexed using FASTX-Toolkit
(http://hannonlab.cshl.edu/fastx_toolkit/) according to the sample
labels and primers (Table 2).
Label and primer sequence removal. Low quality sequences and
sequences shorter than 200 bp were discarded using NGS QC toolkit.
Artificially added sequences, such as DNA labels and primers were
trimmed off using Cutadapt software
(https://cutadapt.readthedocs.io/en/stable/).
ZOTU generation and annotation. We followed the Unoise3
protocol
(http://www.drive5.com/usearch)
(Edgar, 2016) to generate ZOTUs of all 12 samples. All unique ZOTU
sequences were identified using BLAST (Altschul, 2012) against NCBI
database to assign scientific names to ZOTU sequences if possible.
Organism abundance. Each of all ZOTU sequences was used as a
reference and the reads from each soil sample were mapped to the
reference under a similarity of 0.97 using Usearch (Edgar, 2013). The
number of reads matching each ZOTU was recorded as abundance of the ZOTU
(organism).
Suspect’s mud tracking
The potential origins of diatoms found in the mud from the suspect’s
pants were tracked to the 11 candidate soil samples by fast
expectation-maximization for microbial source tracking (FEAST, Shenhav
et al., 2019). FEAST is a software developed for deducing the potential
origin(s) of a microorganism community. FEAST estimates the fraction of
organisms from the potential source as well as the other sources as
unknown source, which helps to verify the true or false source of
microorganism community in the mud from the suspect’s pants.
FEAST
is currently implemented in R and easy to run following the instructions
online
(https://github.com/cozygene/FEAST).