Introduction
Human deoxyribonucleic acid (DNA) has been widely used in human individual identification (Ambers et al., 2018; Lygo et al., 1994; Meng et al., 2019), paternity identification (Bertoglio et al., 2020; Habibi et al., 2019) and other applications in forensics. However, human DNA is not always available. Under this situation, we have to resort to environmental DNA in the crime scene to narrow the search scope for criminal suspects and find out the truth.
Environmental materials such as soil, dust, water, etc., are very likely to be taken away unintentionally by suspects on his or her skin, shoes, clothes, hair or even in the nail seams. Among them, soil, usually contaminated by plant fragments or pollen grains, is the material the police can get in most criminal cases. Plant DNA is quite suitable for the forensic source tracking because of its ubiquity, stability and proper variability.
Plant DNA has a high potential providing definitive evidence during criminal investigations. With the advent of DNA metabarcoding, it has recently been used to find out body dumping site (Yang et al., 2015), residence of unknown human body (Liu et al., 2019), drowning site (Fang et al., 2019), and confirmation of suspected drowning (Kakizaki et al., 2018). Unfortunately, such applications are still very rare due to three main challenges. The first one is the difficulties in species identification of plant DNA in the environmental materials. Past projects (e.g., BARCODE 500K (https://ibol.org), BIOSCAN (Hobern & Hebert, 2019), ISHAM-ITS (Irinyi et al., 2016)) have enriched the pool of DNA barcodes, though the reference library for DNA barcoding is rather not comprehensive. Only less than 5.0% species of flowering plants have their matK or rbcL sequences deposited in GenBank (Liu et al. 2021).
The second challenge is that the Sanger sequencing method is not applicable to environmental DNA because the amplicons are a mixture of many species. Fortunately, next generation sequencing (NGS) platforms meet the requirement of environmental DNA metabarcoding and a very easy data processing method is now available (https://github.com/YanleiLiu1989/Cotu-master).
The last challenge is lack of an “ideal” DNA barcode for DNA metabarcoding (Ferri et al., 2015). DNA barcode is a short DNA sequence for species recognition and discrimination. DNA barcoding is a commonly used biotechnology in biology, environmental science, forensics, etc (Ferri et al., 2015; Hebert et al., 2003). It is a powerful molecular diagnostic method for specimen identification. Finding the best DNA barcodes (Dong et al., 2014; Dong et al., 2015; Kress & Erickson, 2007; Li et al., 2011) or developing new technical improvements (Yu et al., 2011; Xu et al., 2015) was one of the main themes for plant DNA barcoding during the past decade. Unfortunately, there is not a single ideal DNA barcode suitable for all plant species identification, and plant group-specific DNA barcodes seem more realistic. For example,rbcL is much less variable than ycf1 in flowering plants, but acceptable as a DNA barcode for lower plants (Dong et al., 2015; Liu et al., 2020a).
The lower plants (algae) instead of higher plants (mosses, ferns and seed plants) play a very important role in investigation of wet environment-related criminal cases and rbcL has been proposed as a DNA barcode of diatoms (Liu et al., 2020a). The variability ofrbcL is much higher in lower plants than in higher plants andrbcL is one of the few choices of DNA barcodes for lower plants for its relatively higher species coverage of existing sequences and universal PCR primers (Ferri et al., 2015).
In this paper, we demonstrate how to use mud collected from a criminal suspect’s pants to determine the real criminal in a murder case happened in China based on DNA metabarcoding of diatom using chloroplastrbcL gene fragments. The diatom communities in the mud provided solid evidence of the suspect’s appearance in the murder scene.