Mitogenome assemblies
De novo assemblies of the mitogenome of chinook salmon (Oncorhynchus tshawytscha ), Pacific hake (Merluccius productus ), eulachon (Thaleichthys pacificus ), blue lanterfish (Tarletonbeania crenularis ), California headlight fish (Diaphus theta ), and Northern lampfish (Stenobrachius leucopsarus ) were generated from the Oxford Nanopore data in this project. The M. productus, T. pacificus and T. crenularismitogenomes are new contributions to public databases (GenBank Acc. No ON005612-ON005619).
The 11 consensus sequences generated for all the methods optimization tests for chinook salmon are between 16628 and 16634 bp with 99.9% similarity, only differing in homopolymeric regions and follow the canonical vertebrate mitochondrial genome containing 37 genes (13 protein coding, 22 tRNA, and 2 rRNA genes) and 2 noncoding regions (the control region or D-loop and the light strand replication origin or OL) (Fig 4). These sequences are 10-16 bp shorter than reference genome NC002980 (16,644 bp) (Wilhelm et al., 2003) with 112-134 differences and a pairwise identity ranging from 99.23% (Ot-T-H20-mitosequencing) to 99.33% (Ot-T-L17 and Ot-T-H20 mitoenrichment). The targeted mitosequencing run for the heart sample (Ot-T-H20) was the lowest quality for salmon resulting in a few bp shorter and the most dissimilar sequence to the only reference genome available (NC002980) but also failed to call with confidence a short area near the cutting site of gRNA alias 28 that had shallowed read depth (54-60x) with the strict parameters given. The matching sequences obtained from targeted mitosequencing muscle gDNA and liver mtDNA samples (Ot-PCI-1 and Ot-T-L17), with deepest coverage, was deposited in GenBank as the mitogenome of reference for this individual that only had 3 undetermined nucleotides out of 16633 bp (GenBank Acc. ON005616). All the SNPs found when compared with the reference genome are consistent among the 11 consensus sequences from different tissues, samples, and protocol for the same specimen. Most of the SNPs identified resulted in synonymous substitutions in the coding genes (SNPs on the third position of the codon) and no stops. The exceptions are five non-synonymous substitutions (Ala126Thr, in agreement with (Weitemier et al., 2021), and Ala348Thr on ND2; Ile152Val on COII; Leu116Glu and Met95Thr on ND6). There are also two one-nucleotide insertions and two indels that cause two seven-aminoacid modifications in 519-531aa 571-582aa of ND5. These modifications coincide with other reference sequences for other salmonids (AB252719-AB252722, AY032629- AY032632, LC361126-LC361129) suggesting the reference genome available in GenBank (NC002980) may be incorrect for this region of the ND5 gene. In addition, a previously uncharacterized ND1 aminoacid has been ascertained as Ala118. Despite all the above differences, most of the variation is found in the D-loop with a 98.1 % pairwise identity.
A complete mitogenome was generated for the first time forMerluccius productus , for which different haplotypes were found from four different individuals following the canonical vertebrate mitogenome with no gene rearrangements. Three haplotypes obtained from the enrichment methods were deposited in GenBank (Acc. No ON005613- ON005615). Their length ranges from 16736 to 16775 with the variation found in the non-coding region between genes tRNA-Thr and tRNA-Pro (Fig 4) and at the end of the control region. When the sequences were compared against Atlantic hake (M. merluccius ) sequence FR751402, the newly identified haplotypes have a shorter non-coding region between genes tRNA-Thr and tRNA-Pro (201 compared to 533 bp) that is missing a 144-bp repetitive region present in the sister species. The genome skimming approach did not produce a complete mitogenome.
A complete mitogenome was generated for the first time for T. pacificus for which two haplotypes were identified from two different individuals and three runs, with the two samples from the same individual sequenced using the enriched methods only differing in one nucleotide out of 16,762 bp on a homopolymeric region in the control region (GenBank Acc. No ON005619). The resulting sequence of the genome skimming process was not considered due to its low quality. T. pacificus follows a canonical vertebrate mitogenome with no rearrangements or duplications and it contains a repeated motif of a total of 150 bp at the beginning of the control region that was not found on the sequence obtained with the genome skimming approach. Further investigations would be necessary to ascertain the true absence of this repetition cause by either a different haplotype due to a different population origin or bias caused by shallow coverage, short fragment sequencing and degradation and/or the fact that the genome was mapped against a different species.
Three species of myctophids were sequenced, Tarletonbeania crenularis, Diaphus theta and Stenobrachius leucopsarus , which illustrate the exceptional mitochondrial gene reorganization of this group as described by Poulsen et al. (2013). The three mitogenomes sequenced showed 71.11-75.66% pairwise identity and ca. 4,500-5,500 bp difference in 17.6-18.3kbp. However, no protein coding genes change their order for these three species. The three species present a non-coding indel between tRNA -Leu and ND1 of 39-71 bp and also the typical rearrangement of all myctophids in which the tRNA-Cys (C) and tRNA-Tyr (Y) are switched from the canonical WANCY tRNA-gene order to WANYC (single-letter aminoacid code) and several spacer insertion between genes are found (Fig 4) (Poulsen et al., 2013; Satoh et al., 2016). There is also a longer sequence in the putative origin of replication of the light strand (OL). Diaphus theta and Stenobrachius leucopsarus mostly agreed with their publicly accessible references with differences. The sequence generated here for D. theta (Acc. No. ON005612) differs in that it is almost 2kbp longer than GenBank Acc. AP012240 (Poulsen et al., 2013) since here we added the complete tRNA-Thr, tRNA-Glu, tRNA-Pro, a novel putative tRNA-Tyr duplication and the D-loop sequences and a 77.1 % pairwise identity at the tRNA-Phe with the rest of the genome only having 46 SNPs. Further analyses are necessary to corroborate the authenticity of the putative duplication of the tRNA-Tyr found in the control region, in agreement with other gene duplications found most often in this area also on birds, reptiles and fishes (Formenti et al., 2021). Nonetheless, it agrees on the Diaphini rearrangement (from IQM to IMQ) in which tRNA-Gln (Q) and tRNA-Met (M) not only switch places but also strands and an extra tRNA-Met pseudogene found between tRNA-Gln and ND2. S. leucopsarus complete mitogenome (ON005617) is 2.6 kbp longer than GenBank AP012245 and ND4 gene goes from 98.3% pairwise identity in the first 699 bp to 78.7 in the next 700 bp (158 bp substitutions in 1386 bp resulting in only 13 nonsynonymous aminoacid substitutions). More work is needed to ascertain this remarkable difference in the second half of the ND4 gene. The relocation of tRNA-Glu between tRNA-Thr and tRNA-Pro for S. leucopsarus was also observed. S. leucopsarus has 44 poly-G in the control region. Lastly, we generated the first publicly available mitochondrial genome for T. crenularis ( ON005618), which has an insertion of 321 bp between COII and tRNA-Gly.