Gene structure annotation and functional annotation
All of the TEs in the genome were masked and used for gene structure annotation. In this step, three different strategies, including de novo prediction, homolog-based prediction, and transcript-based prediction, were used. For de novo prediction, Augustus software (v3.3.3) (Stanke& Waack 2003) was used with default parameters. For homolog-based annotation, proteins of 10 species, including Esox lucius (GCF_011004845.1) (Ishiguro et al. 2003),Lepisosteus oculatus (GCF_000242695.1) (Inoue et al. 2003), Danio rerio (GCF_000002035.6) (Howe et al. 2013), Oncorhynchus tshawytscha (GCF_002872995.1) (Christensen et al. 2018),Oncorhynchus keta (GCF_012931545.1), Salmo salar(GCF_000233375.1) (Davidson et al.2010), Salmo trutta (GCF_901001165.1), Oncorhynchus nerka (GCF_006149115.1), Oncorhynchus mykiss(GCF_013265735.2), and Oncorhynchus kisutch (GCF_002021735.2), were downloaded from the NCBI database and aligned to the repeat-masked genome by tblastn (Altschul et al.1990) with an e-value of 10e-5. We then used Genewise software (Birney et al. 2004) to select the longest coding regions and/or the highest score at each gene locus. For transcript-based annotation, the RNA-seq reads were assembled into transcripts using Bridger software (Changet al. 2015), and the transcripts were mapped to the genome by BLAT software (v34) (more than 90% identity and coverage) (Kent 2002); PASA (Haas et al. 2003) was then used to link spliced alignments. Finally, EvidenceModeler (v1.1.1) (Haas et al. 2008) was used to integrate these results into the final gene set.
All of the predicted genes were used for functional annotation using the public protein database. InterProScan (v4.8) (Zdobnov& Apweiler 2001) was used to screen proteins against five databases (Pfam, release 24.057; ProDom, 2006.1; MART, release 6.059; PROSITE, release 20.52; PRINT, release 40.058). The Kyoto Encyclopedia of Genes and Genomes (KEGG), SwissProt (Release 2011.6), non-redundant database (NR), and TrEMBL (Release 2011.6) databases were all used in the function annotation in BLAST software (v2.3.0) (Altschul et al.1990) with the e-value of 10e-5.