Gene structure annotation and functional annotation
All of the TEs in the genome were masked and used for gene structure
annotation. In this step, three different strategies, including de
novo prediction, homolog-based prediction, and transcript-based
prediction, were used. For de novo prediction, Augustus software
(v3.3.3) (Stanke& Waack 2003) was used
with default parameters. For homolog-based annotation, proteins of 10
species, including Esox lucius (GCF_011004845.1)
(Ishiguro et al. 2003),Lepisosteus oculatus (GCF_000242695.1)
(Inoue et al. 2003), Danio
rerio (GCF_000002035.6) (Howe et
al. 2013), Oncorhynchus tshawytscha (GCF_002872995.1)
(Christensen et al. 2018),Oncorhynchus keta (GCF_012931545.1), Salmo salar(GCF_000233375.1) (Davidson et al.2010), Salmo trutta (GCF_901001165.1), Oncorhynchus
nerka (GCF_006149115.1), Oncorhynchus mykiss(GCF_013265735.2), and Oncorhynchus kisutch (GCF_002021735.2),
were downloaded from the NCBI database and aligned to the repeat-masked
genome by tblastn (Altschul et al.1990) with an e-value of 10e-5. We then used Genewise software
(Birney et al. 2004) to select the
longest coding regions and/or the highest score at each gene locus. For
transcript-based annotation, the RNA-seq reads were assembled into
transcripts using Bridger software (Changet al. 2015), and the transcripts were mapped to the genome by
BLAT software (v34) (more than 90% identity and coverage)
(Kent 2002); PASA
(Haas et al. 2003) was then used
to link spliced alignments. Finally, EvidenceModeler (v1.1.1)
(Haas et al. 2008) was used to
integrate these results into the final gene set.
All of the predicted genes were used for functional annotation using the
public protein database. InterProScan (v4.8)
(Zdobnov& Apweiler 2001) was used to
screen proteins against five databases (Pfam, release 24.057; ProDom,
2006.1; MART, release 6.059; PROSITE, release 20.52; PRINT, release
40.058). The Kyoto Encyclopedia of Genes and Genomes (KEGG), SwissProt
(Release 2011.6), non-redundant database (NR), and TrEMBL (Release
2011.6) databases were all used in the function annotation in BLAST
software (v2.3.0) (Altschul et al.1990) with the e-value of 10e-5.