Genome annotation
Repeat Elements Annotation
To identify the repeat elements’ sequences, we constructedde-novo repeat library using RepeatModeler2 (Flynn et al. 2020),
including RECON v1.08 (Bao & Eddy, 2002), RepeatScout v1.0.6 (Price et
al. 2005), LtrHarvest (Ellinghaus et al., 2008), which is incorporated
in GenomeTools v1.5.9, Ltr_retriever v2.7 (Ou and Jiang 2018), assuming
default parameters and the extra LTRStruct pipeline which includes Mafft
v7.453 (Katoh and Standley 2013), CD-HIT v4.8.1 (Li & Godzik, 2006) and
Ninja v0.95 (Wheeler 2009). Thereafter, sequences that were obtained by
RepeatModeler, were combined with Repbase v17.01 and a custom database
constructed with the entries ofTakifugu rubripes ,Takifugu flavidus andTetraodon nigroviridis of the FishTEDB (Shao et al. 2018).
Finally, RepeatMasker v4.1.0 (Tarailo-Graovac and Chen 2009) was used to
annotate repeat elements based on the above-described database.
Gene prediction & Functional
annotation
After repeat masking, gene prediction was conducted using MAKER2
pipeline v2.31.10 (Holt and Yandell 2011) with two iterative rounds. We
used a combined strategy of ab initio , homology-based and
transcriptome-based methods. In the first round, for homology
annotation, MAKER2 was initially run in protein2genome mode, while
SWISS-PROT (www.uniprot.org) was
used for protein sequences extraction of three closely related species,Mola mola , Tetraodon nigroviridis and Takifugu
rubripes . For annotation using the RNA-Seq data, est2genome mode was
enabled, which is based on transcriptome evidence. Τranscriptomic reads
from all sequenced tissues were mapped and assembled through the
genome-guide approach, using HISAT2 v2.2.0 (Kim et al. 2015) and
StringTie v2.1.1 (Pertea et al. 2015). Ab initio prediction was
performed with SNAP (Korf 2004)
(http://korflab.ucdavis.edu),
which was independently trained on L. sceleratus genome with
default parameters and AUGUSTUS v3.3.3 (Stanke et al. 2006) previously
trained through BUSCO v3.1.0 (Simão et al. 2015) with the extra
parameter “-long”. The second round of MAKER2 was run using the
previously trained models with the same settings as round one, except
est2genome and protein2genome modes. The previous custom repeat library
and MAKER2 repeat library that used for genome masking, remained for
both rounds. The completeness of putative genes was assessed using BUSCO
v4.0.5 (Simão et al. 2015) against the Actinopterygii odb10 database.
The functional annotation of the predicted genes of L. sceleratuswas performed by similarity search against the UniprotKB/Swissprot
database (release-2020_03) with BLASTP v.2.9.0+ (e-value 1e-6,
-max_target_seqs=10) (Altschul, S.F et al., 1990). InterProScan v5
(Jones et al. 2014) was used to search motifs and domains against all
default databases and the extra of SignalP_EUK and TMHMM. Functional
annotation results were also retrieved using eggNOG-mapper (Huerta-Cepas
et al. 2017) based on fast orthology assignments using precomputed
eggNOG v5.0 (Huerta-Cepas et al. 2019) clusters and phylogenies.
Gene Ontology mapping
Gene ontology analysis was carried out using a custom python script
(gene_ontology_mapping.py). Gene ontology terms were retrieved through
the Uniprot API service
(https://www.uniprot.org/help/programmatic_access) and as queries we
chose the best blast hits that we extracted after the functional
annotation step against UniProtKB/Swiss-Prot.