The impact of sequencing depth
Although GeneMiner can extract many phylogenetic markers from different
types of NGS data, we cannot guarantee 100% accuracy in identifying
target genes and eliminating incorrect assembly. Obtaining the best
results from GeneMiner is dependent on sufficient sequencing depth of
the input data and closely related reference sequences. In Test II, when
the depth is less than 20x, the number of high-quality assembly results
significantly decreases. In practice, if users are utilizing
transcriptome data, although there is no consistent depth on account of
significant expression differences among target genes, the coverage is
often much more than 20x due to the relatively small size of
transcripts. Thus, utilizing transcriptome data for gene mining should
generally yield satisfactory results.
If users are employing shallow genome sequencing data, the minimum depth
that GeneMiner needs will depend on the type of genes to be mined. For
single-to-low-copy nuclear genes, at least 10x depth is needed, while
coverage of 20x or more is recommended. For mining chloroplast,
mitochondrial, or high-copy-number ribosomal genes (e.g., commonly used
ITS genes), the minimum sequencing depth can be as low as 1x. This is
because organelles like chloroplasts and mitochondria are far more
abundant in cells than nuclei, providing higher coverage during
sequencing. However, we recommend using deeper sequencing data whenever
it is possible. Furthermore, although GeneMiner can process long-read
sequencing data without encountering errors, we do not recommend its
utilization, as the pipeline is unable to harness the full benefits of
longer reads.