Alignment of voltage-gated sodium channel alpha subunit 4 (SCN4A )

Mutations in SCN4A proteins have been previously associated with pufferfish TTX resistance (Venkatesh et al., 2005). To investigate these mutations further, sequences of the SCN4 alpha subunit were identified via BLAST (reciprocal best hit approach) against D. rerio ,L. sceleratus and other fishes from the species tree (Supplementary Table 3), as well asTakifugu pardalis (Genbank: BAA90398.1), Latimeria chalumnae (NCBI Reference: XP_006003324.1), Callorhinchus milii (NCBI Reference: XP_007907351.1), using the D. rerio SCN4AA and SCN4AB sequences as queries. The retrieved protein sequences were aligned using MAFFT v7.453 (Katoh and Standley 2013) with the -auto mode. The alignment was manually curated using Jalview v2.11.1.4 (Waterhouse et al., 2009).
RESULTS
Genome size and assembly completeness
Sequencing yielded 57.30 Gb of raw Illumina reads and 9.68 Gb of ONT reads above Q7, with N50 of 48.85 Kb. The estimated genome size was ~360 Mb and best predicted k = 81. After quality trimming and filtering, we retained 44.45 Gb of Ιllumina data for genome polishing and 9.67 Gb of ONT data (Table 1) for the genome assembly. The final assembled and polished genome contained 235 contigs with total length of ~373 Mb, with 41 contigs representing ~91% of the genome and the largest contig sizing 17 Mb and N50 of 11 Mb (Table 2). Regarding genome completeness, we found 98% (4,513 out of 4,584) of the genes included in the BUSCO Αctinopterygian gene set. Of those, 96.20% (4,410) were complete (Table 2), suggesting a high level of completeness and contiguity in the built assembly.
Repeat annotation, gene prediction and functional annotation
A total of 61.9 Mb of repeat sequences that accounted for 16.55% of the genome assembly were masked in L. sceleratus . The class of Retroelements makes up 7.81% of the total assembly and LINEs are the most abundant of this class, with 5.54%. LTR elements sequences (2.07%) is the second most abundant group in the Retroelements class, and the results also indicated that 2.30% of the genome assembly consists of sequences of the class DNA transposons (Table 3).
Gene prediction and functional annotation
The combination of ab initio -based, homologue-based and RNASeq-based methods resulted in 32,451 putative protein-coding genes. After removing putative genes with Annotation Edit Distance (AED) (Eilbeck et al., 2009) score below one (AED<1), with custom script (longestIsoforms.py), we ended up with 21,251 genes of average gene length and exon size 587,43 bp and 249,62 bp, respectively). A total of 20,578 genes were successfully annotated, accounting for 97% of the predicted gene set (Table 4). The BLAST top hits indicated thatL. sceleratus genes exhibited among others, 6,704 sequence similarities to T. rubripes , and 4,057 to T. nigroviridis.
The completeness of the gene set was assessed using BUSCO v4.0.5 (Simão et al. 2015). From a core set of 3,640 single-copy ortholog genes from the Actinopterygii (odb 10) lineage, 92.2% were complete (70.5% as single-copy, 21.7% as duplicates), 1.7% were fragmented and 6.1% were not found.
Orthology assignment and phylogenomic analysis
The total number of genes from all 30 fish proteomes (Supplementary Table 3) analysed by Orthofinder was 731,383 while 21,897 orthogroups were identified. After filtering, we chose 731 one-to-one orthogroups to construct the super-alignment. The initial matrix consisted of 494,732 amino acid positions. The Gblocks-filtered matrix contained 252,477 positions (51% of the initial), which were used for the phylogenomic analysis.
We identified JTT+I+G4+F as the best model which was used for the estimation of the phylogenetic tree (Figure 2). At the resulted phylogeny, almost all branches were supported with 100 bootstraps. The recovered phylogenetic position of L. sceleratus is within Tetraodontidae clade and is placed closer to T. nigroviridis.
Gene family evolution
Gene family evolutiona analysis revealed multiple predicted rapidly expanded and contracted gene families of L. sceleratus (Figure 3; Supplementary Table 4; Supplementary Table 5), respectively. Gene families found only in one or more pufferfish species and not in any other taxon were considered as puffer-specific. Among rapidly evolving families, 4 gene families were identified as L. sceleratusspecific, two as rapidly expanding (Supplementary Table 1) and two gene families as rapidly contracting. Finally, 43 T. bimaculatus and 45 T. rubripes families were found to be rapidly contracted (Figure S1-S2,). The gene family analysis did not reveal a core set of rapidly evolving families at the base of all five puffer species (Figure S2-S3).
Synteny analysis