3.3 Genome annotation
Homology-based methods were used to predict gene models, together with transcriptome data, and we obtained a total of 22,286 protein-coding genes (Table S7). After functional annotation, 22,218 genes of the predicted protein-coding genes were functional, accounting for 99.69% of the total predicted genes (Table S8), and were distributed in chromosome ranges from 460 to 1246 (Fig. 2). Functional annotation in public databases, including KOG, KEGG, NR, SWISS-PROT, and GO, indicated that at least 61.27% (13,654) of the genes displayed homologues in one database (Table S8) and a total of 9,190 genes could be annotated in all databases (Fig. 3A). Compared to seven species with available annotated genomes, no abnormal length distribution of genes, exons, and introns was observed (Fig. 3B).
A total of 21,572 genes expressed in tissue transcripts were obtained based on FPKM values >0, accounting for 96.79% of the total predicted protein-coding genes. When the expression of genes in muscle was used as a criterion and an FDR value ≤0.005, we obtained differentially expressed genes from multiple tissues (Fig. 4A). We focused on the intersection size between tissues, and there were mostly 965 genes shared by the muscle and spleen (2,509/2,232 genes expressed up/down, Fig. 4B), and the smallest 227 genes shared by muscle and retina (1,632/1,172 genes expressed up/down, Fig. 4B).
Transposons (RNA and DNA types) and simple sequence repeats (SSRs) were identified in the C. undulatus genome. We found 540.85 Mb of the repeat sequences, which accounted for 46.07% of the genome, and transposons accounted for 39.88% of the genome (Table 2). A total of 711 ncRNAs, 111 rRNAs, and 2,618 tRNAs were annotated in the C. undulatus genome (Table S9). The divergence rates of the transposons were mostly lower than 30% (Fig. 5A), suggesting recent activity and a burst in the genome. In contrast, ray-finned fishes display the highest diversity, such as the zebrafish, which displays 27 transposon super families (Sotero-Caio et al . 2017). Transposon activity and diversity are associated with the evolutionary history of species. Zebrafish originated about 230 Mya (Tine et al. 2014), whereasC. undulatus diverged from a common ancestor with Cheilines around 50 Mya (Cowman et al. 2009). In comparison with ten ray-finned fish genomes with annotated transposons, such as zebrafish (Howe et al. 2013), spotted sea bass (Shao et al. 2018),Takifugu rubripes (Aparicio et al. 2002), corkwing wrasse (Mattingsdal et al. 2018), Nile tilapia (Brawand et al. 2014), the orange clownfish (Lehmann et al. 2018), S. anshuiensis (Yang et al. 2016), flatfish (Chen et al.2014), and mudskipper (You et al. 2014), we found that transposon content contributed to genome size, with larger genomes exhibiting richer transposon content (Fig. 5B). Transposon content is highly present in the genome of C. undulatus , suggesting importantly roles of transposon in genomic evolutions.