The impact of sequencing depth
Although GeneMiner can extract many phylogenetic markers from different types of NGS data, we cannot guarantee 100% accuracy in identifying target genes and eliminating incorrect assembly. Obtaining the best results from GeneMiner is dependent on sufficient sequencing depth of the input data and closely related reference sequences. In Test II, when the depth is less than 20x, the number of high-quality assembly results significantly decreases. In practice, if users are utilizing transcriptome data, although there is no consistent depth on account of significant expression differences among target genes, the coverage is often much more than 20x due to the relatively small size of transcripts. Thus, utilizing transcriptome data for gene mining should generally yield satisfactory results.
If users are employing shallow genome sequencing data, the minimum depth that GeneMiner needs will depend on the type of genes to be mined. For single-to-low-copy nuclear genes, at least 10x depth is needed, while coverage of 20x or more is recommended. For mining chloroplast, mitochondrial, or high-copy-number ribosomal genes (e.g., commonly used ITS genes), the minimum sequencing depth can be as low as 1x. This is because organelles like chloroplasts and mitochondria are far more abundant in cells than nuclei, providing higher coverage during sequencing. However, we recommend using deeper sequencing data whenever it is possible. Furthermore, although GeneMiner can process long-read sequencing data without encountering errors, we do not recommend its utilization, as the pipeline is unable to harness the full benefits of longer reads.