DISCUSSION
Genome size and assembly completeness
In this study, a high-quality pufferfish genome assembly of high contiguity was reconstructed, from data obtained from a single MinION flow cell and half a lane of Illumina HiSeq. To our knowledge, only one other highly contiguous reference Tetraodontiformes genome assembly has previously been constructed using the same strategy, forThamnaconus septentrionalis (Bian et al. 2019). The assembly ofL. sceleratus (360 Mb) is comparable in size with that of other puffers, such as Fugu rubripes (~365 Mb; Aparicio et al. 2002), T. flavidus (~377 Mb; Zhou et al. 2019a), T. bimaculatus (~393.15 Mb; Zhou et al. 2019b), T. obscurus (~373 Mb; Kang et al. 2020) and T. nigroviridis (340 Mb, Jaillon et al. 2004). The contig N50 value (~11 Mb) of the L. sceleratus assembly is considerably greater than that reported for the genomes of T. bimaculatus (1.31 Mb; Zhou et al. 2019b) and T. flavidus (4.4 Mb; Zhou et al. 2019a). Similarly, our assembly appears of equivalent levels of completeness to other Tetraodontidae genomes, based on BUSCO scores (e.g., T. obscurus [Kang et al. (2020)] and T. flavidus [Zhou et al. (2019a)]).
Repeat content, gene prediction and functional annotation
The percentage of transposable elements (TEs) found in the L. sceleratus genome (16.55 % of the assembled genome) is marginally higher than the one found in T. septentrionalis (14.2%) (Bian et al., 2019), T. obscurus (11.05%) (Kang et al., 2020), andM. mola (11%) (Pan et al. 2016). Moreover, it is almost twofold higher that in T. rubripes (7.53%) and threefold higher thanT. nigroviridis (5.60%) and T. flavidus (6.87%) (Gao et al., 2014). T. rubripes contain more copies of transposable elements than T. nigroviridis, which have been proposed to contribute to its marginally larger genome size (365-370 Mb) (Jaillon et al., 2004). Although the L. sceleratus genome has a comparable size to reported Takifugu genomes, it harbors much higher repeat content. Moreover, D. holocanthus genome of the Diodontidae family contains 36.35% repetitive sequences, almost double the repeat content of L. sceleratus . These findings imply that TEs might follow an independent pathway of accumulation and diversification across Tetraodontiformes species. In the case of L. sceleratus , such differential repeat expansion may have taken place after the divergence of the Takifugu and Tetraodon genera.
Despite such TE content variation across closely related taxa, positive correlation of genome size and TE repeat content has been documented across a larger evolutionary scale in teleosts (Shao et al., 2019). For example, the relatively smaller genome of T. nigroviridis(~360 Mb) contains 5.6% TEs, in contrast to the zebrafish genome (~1.4 Gb) which is composed of 55% repetitive sequences (Shao et al., 2019). This positive correlation is also reflected in the small size and relatively low repeat content of the L. sceleratus genome, regardless of differences with other pufferfish. However, it would be interesting to further explore these differences, as they may be informative for genome evolution. As an interesting example, LINE elements are the most abundant in the L. sceleratus genome, with ~170,000 copies, as compared to the ~12,300 copies of the T. rubripes genome. This finding indicates dynamic genome evolution in the two species. Previous studies have shown a correlation between genome TEs and species adaptations to new environments, suggesting they may be associated to invasiveness (Yuan et al. 2018, Stapley et al., 2015). Thus, the repeat content of L. sceleratus may play a role in its fast adaptation to novel environments and should be investigated further.
Species tree reconstruction
Although the order Tetraodontiformes is a cosmopolitan taxonomic group that includes multiple families, large parts of their phylogenetic relationships remain unexplored. In this study, we presented the first phylogenetic tree based on whole genome data including the invasive “sprinter” L. sceleratus . The recovered phylogenetic position of L. sceleratus is within Tetraodontidae and is placed closer toT. nigroviridis , while the long branch length of the Tetraodontidae clade possibly suggests a faster evolutionary rate. Regarding relationships within the pufferfish group (T. nigroviridis, T. rubripes, T. flavidus, T. bimaculatus andL. sceleratus ), the resulting topology agrees with previous studies (Hughes et al., 2020, Hughes et al., 2018, Meynard et al., 2012, Yamanoue et al., 2009). Moreover, the Tetraodontidae group was recovered confidently as monophyletic in accordance with Yamanoue et al. (2011). Our results suggest that Tetraodontiformes are the closest group to Sparidae and corroborates the results of Natsidis et al. (2019) and of others (Kawahara et al., 2008; Meynard et al., 2012), based both on six mitochondrial and two nuclear genes.
Synteny analysis
All pairwise comparisons of the whole-genome alignment analysis ofL. sceleratus against the four other Tetraodontidae species (Figures 5) (Figure S9-S11), showed highly conserved synteny. The genome that exhibited the highest synteny conservation with the L. sceleratus genome was that of T. nigroviridis, in accordance with our reconstructed phylogeny which places the two species as more closely related to each other compared to the rest.
The synteny between L. sceleratus and the three species of the genus Takifugu (T. rubripes , T. bimaculatus andT. flavidus ) was less conserved, especially between L. sceleratus and T. bimaculatus .
To sum up, the higher synteny between L. sceleratus and T. nigroviridis corroborates their closer phylogenetic position compared to the three Takifugu species.
Gene family evolution and adaptation
Adapting to a new habitat is a challenging task for a species, requiring a certain degree of physiological plasticity. To achieve establishment in a new niche, an invader must face environmental challenges that involve both biotic and abiotic factors (Crowl et. al., 2008). Invasive species are facing novel pathogens during the colonisation of new environments and the ability to deal with these new immune challenges is key to their invasive success (Lee and Klasing, 2004). Interestingly, we found several expanded immune related families, includingimmunoglobulins (C-Type and V-Type) , Ig heavy chain Mem5-like , B-cell receptors and the Fish-specific NACHT associated domain, which are related to the innate immunity (Stein et al., 2007).
In addition, we also detected major histocompatibility complex (MHC) class I genes in the expanded gene families. MHC genes are crucial for the immune response, involved in pathogen recognition by T cells (Germain, 1994), thus initiating the adaptive immune response. The expanded repertoire of L. sceleratus immune response associated genes might be related to its survival in novel habitats, through the detection and inhibition of a wide range pathogens. Therefore, in this context, we suggest further research to explore the role of the expanded genes related to immune response.
Another interesting finding was the expansion of the fucosyltransferase (FUT) gene family. In particular, we detected 24 FUT9 (alpha (1,3) fucosyltransferase 9) genes. Glycosylation is one of the most frequent post-translational modifications of a protein. Many proteins involved in the immune response are glycosylated, extending their diversity and functionality (Bednarska et al., 2017). Fucosylation, a type of glycosylation, plays an essential role in cell proliferation, metastasis and immune escape (Jia et al., 2018). In mice, FucTC has been shown to regulate leukocyte trafficking between blood and the lymphatic system, after its engagement in selectin ligand biosynthesis (Maly et al., 1996).
Overall, based on our results, we may hypothesize that the rapidly expanded innate immune system gene families identified play a role in the ability of L. sceleratus spread rapidly throughout the Eastern Mediterranean (Kalogirou 2011).