Mitogenome assemblies
De novo assemblies of the mitogenome of chinook salmon
(Oncorhynchus tshawytscha ), Pacific hake (Merluccius
productus ), eulachon (Thaleichthys pacificus ), blue lanterfish
(Tarletonbeania crenularis ), California headlight fish
(Diaphus theta ), and Northern lampfish (Stenobrachius
leucopsarus ) were generated from the Oxford Nanopore data in this
project. The M. productus, T. pacificus and T. crenularismitogenomes are new contributions to public databases (GenBank Acc. No
ON005612-ON005619).
The 11 consensus sequences generated for all the methods optimization
tests for chinook salmon are between 16628 and 16634 bp with 99.9%
similarity, only differing in homopolymeric regions and follow the
canonical vertebrate mitochondrial genome containing 37 genes (13
protein coding, 22 tRNA, and 2 rRNA genes) and 2 noncoding regions (the
control region or D-loop and the light strand replication origin or
OL) (Fig 4). These sequences are 10-16 bp shorter than
reference genome NC002980 (16,644 bp) (Wilhelm et al., 2003) with
112-134 differences and a pairwise identity ranging from 99.23%
(Ot-T-H20-mitosequencing) to 99.33% (Ot-T-L17 and Ot-T-H20
mitoenrichment). The targeted mitosequencing run for the heart sample
(Ot-T-H20) was the lowest quality for salmon resulting in a few bp
shorter and the most dissimilar sequence to the only reference genome
available (NC002980) but also failed to call with confidence a short
area near the cutting site of gRNA alias 28 that had shallowed read
depth (54-60x) with the strict parameters given. The matching sequences
obtained from targeted mitosequencing muscle gDNA and liver mtDNA
samples (Ot-PCI-1 and Ot-T-L17), with deepest coverage, was deposited in
GenBank as the mitogenome of reference for this individual that only had
3 undetermined nucleotides out of 16633 bp (GenBank Acc. ON005616). All
the SNPs found when compared with the reference genome are consistent
among the 11 consensus sequences from different tissues, samples, and
protocol for the same specimen. Most of the SNPs identified resulted in
synonymous substitutions in the coding genes (SNPs on the third position
of the codon) and no stops. The exceptions are five non-synonymous
substitutions (Ala126Thr, in agreement with (Weitemier et al., 2021),
and Ala348Thr on ND2; Ile152Val on COII; Leu116Glu and Met95Thr on ND6).
There are also two one-nucleotide insertions and two indels that cause
two seven-aminoacid modifications in 519-531aa 571-582aa of ND5. These
modifications coincide with other reference sequences for other
salmonids (AB252719-AB252722, AY032629- AY032632, LC361126-LC361129)
suggesting the reference genome available in GenBank (NC002980) may be
incorrect for this region of the ND5 gene. In addition, a previously
uncharacterized ND1 aminoacid has been ascertained as Ala118. Despite
all the above differences, most of the variation is found in the D-loop
with a 98.1 % pairwise identity.
A complete mitogenome was generated for the first time forMerluccius productus , for which different haplotypes were found
from four different individuals following the canonical vertebrate
mitogenome with no gene rearrangements. Three haplotypes obtained from
the enrichment methods were deposited in GenBank (Acc. No ON005613-
ON005615). Their length ranges from 16736 to 16775 with the variation
found in the non-coding region between genes tRNA-Thr and tRNA-Pro (Fig
4) and at the end of the control region. When the sequences were
compared against Atlantic hake (M. merluccius ) sequence FR751402,
the newly identified haplotypes have a shorter non-coding region between
genes tRNA-Thr and tRNA-Pro (201 compared to 533 bp) that is missing a
144-bp repetitive region present in the sister species. The genome
skimming approach did not produce a complete mitogenome.
A complete mitogenome was generated for the first time for T.
pacificus for which two haplotypes were identified from two different
individuals and three runs, with the two samples from the same
individual sequenced using the enriched methods only differing in one
nucleotide out of 16,762 bp on a homopolymeric region in the control
region (GenBank Acc. No ON005619). The resulting sequence of the genome
skimming process was not considered due to its low quality. T.
pacificus follows a canonical vertebrate mitogenome with no
rearrangements or duplications and it contains a repeated motif of a
total of 150 bp at the beginning of the control region that was not
found on the sequence obtained with the genome skimming approach.
Further investigations would be necessary to ascertain the true absence
of this repetition cause by either a different haplotype due to a
different population origin or bias caused by shallow coverage, short
fragment sequencing and degradation and/or the fact that the genome was
mapped against a different species.
Three species of myctophids were sequenced, Tarletonbeania
crenularis, Diaphus theta and Stenobrachius leucopsarus , which
illustrate the exceptional mitochondrial gene reorganization of this
group as described by Poulsen et al. (2013). The three mitogenomes
sequenced showed 71.11-75.66% pairwise identity and ca. 4,500-5,500 bp
difference in 17.6-18.3kbp. However, no protein coding genes change
their order for these three species. The three species present a
non-coding indel between tRNA -Leu and ND1 of 39-71 bp and also the
typical rearrangement of all myctophids in which the tRNA-Cys (C) and
tRNA-Tyr (Y) are switched from the canonical WANCY tRNA-gene order to
WANYC (single-letter aminoacid code) and several spacer insertion
between genes are found (Fig 4) (Poulsen et al., 2013; Satoh et al.,
2016). There is also a longer sequence in the putative origin of
replication of the light strand (OL). Diaphus
theta and Stenobrachius leucopsarus mostly agreed with their
publicly accessible references with differences. The sequence generated
here for D. theta (Acc. No. ON005612) differs in that it is
almost 2kbp longer than GenBank Acc. AP012240 (Poulsen et al., 2013)
since here we added the complete tRNA-Thr, tRNA-Glu, tRNA-Pro, a novel
putative tRNA-Tyr duplication and the D-loop sequences and a 77.1 %
pairwise identity at the tRNA-Phe with the rest of the genome only
having 46 SNPs. Further analyses are necessary to corroborate the
authenticity of the putative duplication of the tRNA-Tyr found in the
control region, in agreement with other gene duplications found most
often in this area also on birds, reptiles and fishes (Formenti et al.,
2021). Nonetheless, it agrees on the Diaphini rearrangement (from IQM to
IMQ) in which tRNA-Gln (Q) and tRNA-Met (M) not only switch places but
also strands and an extra tRNA-Met pseudogene found between tRNA-Gln and
ND2. S. leucopsarus complete mitogenome (ON005617) is 2.6 kbp
longer than GenBank AP012245 and ND4 gene goes from 98.3% pairwise
identity in the first 699 bp to 78.7 in the next 700 bp (158 bp
substitutions in 1386 bp resulting in only 13 nonsynonymous aminoacid
substitutions). More work is needed to ascertain this remarkable
difference in the second half of the ND4 gene. The relocation of
tRNA-Glu between tRNA-Thr and tRNA-Pro for S. leucopsarus was
also observed. S. leucopsarus has 44 poly-G in the control
region. Lastly, we generated the first publicly available mitochondrial
genome for T. crenularis ( ON005618), which has an insertion of
321 bp between COII and tRNA-Gly.