Materials and methods
The whole study procedure includes five parts: soil DNA acquisition, amplicon preparation, amplicon sequencing, NGS data processing and suspect’s mud tracking (Fig. S1).
Soil DNA acquisition
A total of 12 soil samples were collected, including one mud soil sample collected from the criminal suspect’s pants, three soil samples from the center of crime scene and eight soil samples surrounding crime scene (Table 1, Fig. 1).
Approximately 25 mg of fully mixed soil of each sample was ground into powder in a grinder mill (MM400, Retsch GmbH, Germany) equipped with a zirconium magnetic bead at 29 Hz for two minutes at 30-second intervals to minimize DNA damage. Total soil DNA was extracted using the mCTAB method (Li et al., 2013). DNA was resuspended in 100 μL of TE buffer, visually checked on 1.5% agarose gels, and quantified on a Nanodrop2000c sectrophotometer (Thermo Fisher Scientific Inc., USA).
Amplicon preparation
Since this murder case happened in a small canal, we chose rbcLgene of diatom as our DNA barcode. The primer pair BacirbcL2f and BacirbcL2r (Liu et al., 2020a) was used for this case. DNA fragments from the same sample were labeled with a unique DNA oligo by PCR (Table 2, Fig. 2). A unique eight-nucleotide oligo for each sample was attached to the 5’ end of both forward and reverse primers. PCR with a 10 μL mixture was conducted on Eppendorf instrument Mastercycler proS following Dong et al. (2015). The PCR products were checked by electrophoresis with a 1.5% agarose gel containing ethidium bromide under ultraviolet transilluminator.
The DNA-labelled PCR products were mixed, purified using a purification kit (Aidlab Biotechnologies Co., Ltd, China) on a 2% agarose gel, and quantified on a Nanodrop2000c spectrophotometer.
Amplicon sequencing
A sequencing library of the final PCR mixture was constructed for Ion Torrent platform using NEBNext® Fast DNA Library Prep Set for Ion Torrent (New England BioLabs, USA) and the library was sequenced at Maize Research Center, Beijing Academy of Agriculture and Forestry Sciences on Ion Torrent S5xl Chip400.
NGS data processing
Quality control and demultiplexing. NGS data quality control was carried out using the NGS QC toolkit with the default parameters (Ravi et al., 2012). After quality control, the NGS data from Ion torrent S5xl were demultiplexed using FASTX-Toolkit (http://hannonlab.cshl.edu/fastx_toolkit/) according to the sample labels and primers (Table 2).
Label and primer sequence removal. Low quality sequences and sequences shorter than 200 bp were discarded using NGS QC toolkit. Artificially added sequences, such as DNA labels and primers were trimmed off using Cutadapt software (https://cutadapt.readthedocs.io/en/stable/).
ZOTU generation and annotation. We followed the Unoise3 protocol (http://www.drive5.com/usearch) (Edgar, 2016) to generate ZOTUs of all 12 samples. All unique ZOTU sequences were identified using BLAST (Altschul, 2012) against NCBI database to assign scientific names to ZOTU sequences if possible.
Organism abundance. Each of all ZOTU sequences was used as a reference and the reads from each soil sample were mapped to the reference under a similarity of 0.97 using Usearch (Edgar, 2013). The number of reads matching each ZOTU was recorded as abundance of the ZOTU (organism).
Suspect’s mud tracking
The potential origins of diatoms found in the mud from the suspect’s pants were tracked to the 11 candidate soil samples by fast expectation-maximization for microbial source tracking (FEAST, Shenhav et al., 2019). FEAST is a software developed for deducing the potential origin(s) of a microorganism community. FEAST estimates the fraction of organisms from the potential source as well as the other sources as unknown source, which helps to verify the true or false source of microorganism community in the mud from the suspect’s pants. FEAST is currently implemented in R and easy to run following the instructions online (https://github.com/cozygene/FEAST).