Introduction
Environmental DNA (eDNA) is providing previously unthinkable insights
into the aquatic environment as a non-invasive and relatively
cost-efficient tool, illustrating the presence-absence and distribution
of certain species and the composition of the community of an ecosystem,
and is particularly informative to evaluate these parameters as a result
of variable conditions (Furlan et al., 2019; Gallego et al., 2020; Stat
et al., 2017; Thomsen & Willerslev, 2015). Besides the logistics and
technical constraints to acquire samples, there are further challenges
in accurately characterizing the biome, including the molecular strategy
used (i.e. DNA extraction and the marker or gene of choice), and the
reference databases used to identify the origin of the DNA found in a
certain environment (Jackman et al., 2021; Schenekar et al., 2020; Wang
et al., 2021). Environmental DNA identification needs robust,
comprehensive, and accurate DNA reference libraries based on solid
taxonomic frameworks, and this effort requires more exhaustive,
comprehensive, and revamped efforts in light of recent technical
advancements in DNA sequencing (Margaryan et al., 2021; Novak et al.,
2020; Taberlet et al., 2012).
There are currently two main strategies for biomonitoring surveys to
describe the community composition or evaluate the abundance of certain
species: DNA metabarcoding approaches or targeted molecular assays using
quantitative Polymerase Chain Reaction (qPCR) or digital PCR (dPCR) (Shu
et al., 2020). These molecular tools are well established and have been
added to the toolbox of conservation management. These studies make use
of the public genomic databases and, more specifically, of mitochondrial
genes found in public repositories to construct well represented
alignments to identify the amplicon sequence variants (ASV) in the case
of metabarcoding or achieve the desired specificity for primer design in
the case of targeted assays. However, in spite of the colossal
sequencing effort undertaken in the last two decades with initiatives
like the Barcode of Life (Mugnai et al., 2021; Ratnasingham & Hebert,
2007), mitochondrial genome reference data for the breadth of taxa of
interest are yet not accessible and if they are, only one or a few genes
may be available. Among these, the suitability of existing sequence data
for the purpose of designing species-specific oligonucleotides is
typically suboptimal for targeted eDNA assays. Such is the situation of
one the most traditionally sequenced genes for barcoding purposes, the
cytochrome c oxidase subunit 1 (COI) of the mitochondrion that offers
scattered, short conserved regions that are unsuitable for primer design
in order to effectively discriminate the taxon of interest (Langlois et
al., 2020; Margaryan et al., 2021; Schroeter et al., 2020).
Metabarcoding analysis of fish targets more conserved genes or regions
where universal primers can be placed flanking interspecifically
variable regions. Examples are the 12S rRNA subunit (12S) using the
MiFish primers (Miya et al., 2015) that produce an amplicon of ca. 170
base pairs (bp) and the 16S rRNA (16S) with primers Ac16S that amplify a
region of 330 bp (Evans et al., 2016) and ca. 100 bp-fragment with
Fish16S primers (Deagle et al., 2009; Shaw et al., 2016). The
aforementioned markers have provided an extraordinary wealth of
information for community studies and species detection using eDNA
(Miya, 2022; Shu et al., 2020). Building more comprehensive
mitochondrial genome databases would be particularly advantageous for
those species for whom identification cannot be resolved with short
fragments of the 16S, 12S, or COI genes and to expand the representation
to account for intra and interspecific variability. In addition, having
more regions of the mitogenome available would facilitate an eDNA
multi-marker metabarcoding approach, eDNA population genetics studies
and even explore the possibility of mitochondria-associated disorders
caused by mutations, an unexplored line of research that may provide
insights into health or fitness questions (Brown, 2008; Dugal et al.,
2021; Jackman et al., 2021; Jensen et al., 2020; Sharma & Sampath,
2019). Despite the mitochondrial genes being physically linked to each
other, mitochondrial haplotypes from eDNA could determine minimum number
of individuals and provide information about the origin of the
populations (e.g. in the case of the stocks of anadromous fishes found
in the ocean (Weitemier et al., 2021)). Additionally, whole
mitochondrial genome databases, from verified, vouchered specimens, will
also be critical for seafood monitoring, as molecular methods are
routinely used to identify species in domestic and international trade
(Bourret et al., 2020; Ogden, 2008). The expansion of mitochondrial
genomic databases is not only costly and requires access to voucher
specimens, but also relies in the use of so-called universal primers
that have proven to be less generic than desired. Whole mitochondrial
sequencing, although attainable, is still expensive and not readily
accessible for all research groups due to a limitation in read length,
non-affordable methods and long protocols (Gilpatrick et al., 2020).
In this study, we explore target enrichment methods to attain whole
mitogenome sequence data in a simple and cost-effective manner. Current
technologies allow for whole genome direct sequencing (i.e. no special
treatments are necessary) that can yield regions of interest,
particularly those in high copy number such as mitochondrial DNA, which
can be identified and recovered from the data in a process called
‘genome skimming’ or shallow sequencing (Straub et al., 2012). We
propose that targeted mitochondrial DNA enrichment during the DNA
isolation process or library preparation step should be sought if
willing to reduce costs, time, data storage capacity and bioinformatic
capabilities while improving coverage and consensus sequences, avoiding
pseudogenes and general background noise that can affect the molarity of
the target and thus compromise the sequencing performance. In spite of
hundreds to about a thousand mitochondrial DNA (mtDNA) copies present in
a fish cell depending on the tissue and the age, which is lower than in
mammals (Hartmann et al., 2011), the amount of mtDNA in a preparation is
normally around 0.1% of the genomic DNA (Robin & Wong, 1988) and is
overwhelmed with nuclear DNA (nDNA). Different enrichment outcomes can
be attained depending on the DNA extraction, treatment, sequencing and
bioinformatic approaches employed. The physical properties of mtDNA
(enclosed organelle physical location and the circularity of the
mitochondrial genome) can be used to preferentially extract mtDNA using
sequential precipitation methods or to deplete the non-circular DNA
(i.e., nDNA) using exonucleases. Targeted enrichment can be also
conducted using CRISPR-based methods by targeting conserved regions of
the mitogenomes with specific guide RNAs (Schultzhaus et al., 2021).
Mitochondrial enrichment without PCR amplification avoids universal
primer incompatibility and PCR amplification errors. Additionally, long
range amplification is proving challenging (Ramón-Laca personal
observations, (Gilpatrick et al., 2020)) and target enrichment using
hybridization capture is not yet fully operative for long fragments.
Long fragment sequences can help diminish the number of nuclear
mitochondrial sequences (NumtS) that can be very long and are found in
fish in a greater ratio than in most vertebrate species (Antunes &
Ramos, 2005; Dayama et al., 2014). Long fragments preserve the order of
the genes, in contrast with the short reads sequencing platforms that
can also be affected by PCR bias on AT-rich regions (Gan et al., 2019).
Long sequences can be key for accurate genome assembly (Pollard et al.,
2018), in particular in repetitive regions, which are sometimes found in
the control region of the mitochondria and have proven challenging to
sequence with traditional methods (i.e., Sanger and whole genome
sequencing of short fragments) (Formenti et al., 2021; McDonald et al.,
2021). A rearrangement in the order of the genes will not be missed if
transferring annotations from a different species because the order of
the genes is determined by the sequence and not the reference genome.
For all the aforementioned reasons, a de novo assembly of whole
mitogenomes should be favored, to not bias the newly generated
mitogenomes and to not overlook any possible modifications.
However, long-fragment sequencing comes with its own challenges. The
main downside, and a common criticism, of long-read sequencing with
Oxford Nanopore technologies is the relatively high error rates in the
sequences obtained. Nonetheless, in opposition with short-read
platforms, these errors are mostly random except for the homopolymeric
regions from single pore reads with ONT that can be overcome with high
read depth (Pollard et al., 2018; Schultzhaus et al., 2021) and with the
constantly evolving flow cells, chemistry and base calling algorithms.
Collections of fish have traditionally preserved the specimens (e.g.
whole individuals, fin clips) in jars or tubes with >95%
ethanol. This method has worked well for gene sequencing or
microsatellite or SNP typing, but does not prevent the degradation of
the high molecular weight fragments (Oosting et al., 2020) and thus most
specimens of the collections will not yield fragments of the desired
length to hinder accidental sequencing of NumtS. In addition, fish
samples can sometimes take long to be sorted even on board of dedicated
research vessels, which can compromise tissue quality and lead to
degradation of most of the high molecular weight DNA (Oosting et al.,
2020; Rodriguez-Ezpeleta et al., 2013).
In this study we show how to generate whole mitogenomes from fish
species, with the aim of generating affordable and comprehensive
databases that are not restricted to a few genes of interest. The
long-fragment approach combined with the mitochondrial DNA enrichment
produced whole mitogenomes with full coverage and great sequencing depth
while using fewer computational and sequencing resources than genome
skimming. Two approaches that enrich the mitochondrial DNA were
evaluated in this study: 1) Mitochondrial DNA enrichment by isolating
intact mitochondria; and 2) Targeted mitosequencing by using CRISPR Cas9
on conserved regions of the mitochondrial genome. Both approaches are
followed by sequencing on an Oxford Nanopore platform. These target
enrichment and long-fragment sequencing approaches efficiently produce
data for whole mitogenomes while using less computational and sequencing
resources than genome skimming, simplifying the discovery of mitogenomes
of non-model or understudied fish taxa to a broad range of laboratories
worldwide.