Alignment of voltage-gated sodium channel alpha
subunit 4
(SCN4A )
Mutations in SCN4A proteins have been previously associated with
pufferfish TTX resistance (Venkatesh et al., 2005). To investigate these
mutations further, sequences of the SCN4 alpha subunit were identified
via BLAST (reciprocal best hit approach) against D. rerio ,L. sceleratus and other fishes from the species tree
(Supplementary Table 3), as well asTakifugu pardalis (Genbank:
BAA90398.1), Latimeria chalumnae (NCBI Reference:
XP_006003324.1), Callorhinchus milii (NCBI Reference:
XP_007907351.1), using the D. rerio SCN4AA and SCN4AB sequences
as queries. The retrieved protein sequences were aligned using MAFFT
v7.453 (Katoh and Standley 2013) with the -auto mode. The alignment was
manually curated using Jalview v2.11.1.4 (Waterhouse et al., 2009).
RESULTS
Genome size and assembly
completeness
Sequencing yielded 57.30 Gb of raw Illumina reads and 9.68 Gb of ONT
reads above Q7, with N50 of 48.85 Kb. The estimated genome size was
~360 Mb and best predicted k = 81. After quality
trimming and filtering, we retained 44.45 Gb of Ιllumina data for genome
polishing and 9.67 Gb of ONT data (Table 1) for the genome assembly. The
final assembled and polished genome contained 235 contigs with total
length of ~373 Mb, with 41 contigs representing
~91% of the genome and the largest contig sizing 17 Mb
and N50 of 11 Mb (Table 2). Regarding genome completeness, we found 98%
(4,513 out of 4,584) of the genes included in the BUSCO Αctinopterygian
gene set. Of those, 96.20% (4,410) were complete (Table 2), suggesting
a high level of completeness and contiguity in the built assembly.
Repeat annotation, gene
prediction and functional annotation
A total of 61.9 Mb of repeat sequences that accounted for 16.55% of the
genome assembly were masked in L. sceleratus . The class of
Retroelements makes up 7.81% of the total assembly and LINEs are the
most abundant of this class, with 5.54%. LTR elements sequences
(2.07%) is the second most abundant group in the Retroelements class,
and the results also indicated that 2.30% of the genome assembly
consists of sequences of the class DNA transposons (Table 3).
Gene prediction and functional
annotation
The combination of ab initio -based, homologue-based and
RNASeq-based methods resulted in 32,451 putative protein-coding genes.
After removing putative genes with Annotation Edit Distance (AED)
(Eilbeck et al., 2009) score below one (AED<1), with custom
script (longestIsoforms.py), we ended up with 21,251 genes of average
gene length and exon size 587,43 bp and 249,62 bp, respectively). A
total of 20,578 genes were successfully annotated, accounting for 97%
of the predicted gene set (Table 4). The BLAST top hits indicated thatL. sceleratus genes exhibited among others, 6,704 sequence
similarities to T. rubripes , and 4,057 to T. nigroviridis.
The completeness of the gene set was assessed using BUSCO v4.0.5 (Simão
et al. 2015). From a core set of 3,640 single-copy ortholog genes from
the Actinopterygii (odb 10) lineage, 92.2% were complete (70.5%
as single-copy, 21.7% as duplicates), 1.7% were fragmented and 6.1%
were not found.
Orthology assignment and
phylogenomic analysis
The total number of genes from all 30 fish proteomes (Supplementary
Table 3) analysed by Orthofinder was 731,383 while 21,897 orthogroups
were identified. After filtering, we chose 731 one-to-one orthogroups to
construct the super-alignment. The initial matrix consisted of 494,732
amino acid positions. The Gblocks-filtered matrix contained 252,477
positions (51% of the initial), which were used for the phylogenomic
analysis.
We identified JTT+I+G4+F as the best model which was used for the
estimation of the phylogenetic tree (Figure 2). At the resulted
phylogeny, almost all branches were supported with 100 bootstraps. The
recovered phylogenetic position of L. sceleratus is within
Tetraodontidae clade and is placed closer to T. nigroviridis.
Gene family evolution
Gene family evolutiona analysis revealed multiple predicted rapidly
expanded and contracted gene families of L. sceleratus (Figure 3;
Supplementary Table 4; Supplementary Table 5), respectively. Gene
families found only in one or more pufferfish species and not in any
other taxon were considered as puffer-specific. Among rapidly evolving
families, 4 gene families were identified as L. sceleratusspecific, two as rapidly expanding (Supplementary Table 1) and two gene
families as rapidly contracting. Finally, 43 T. bimaculatus and
45 T. rubripes families were found to be rapidly contracted
(Figure S1-S2,). The gene family analysis did not reveal a core set of
rapidly evolving families at the base of all five puffer species (Figure
S2-S3).
Synteny analysis