3.3 Genome annotation
Homology-based methods were used to predict gene models, together with
transcriptome data, and we obtained a total of 22,286 protein-coding
genes (Table S7). After functional annotation, 22,218 genes of the
predicted protein-coding genes were functional, accounting for 99.69%
of the total predicted genes (Table S8), and were distributed in
chromosome ranges from 460 to 1246 (Fig. 2). Functional annotation in
public databases, including KOG, KEGG, NR, SWISS-PROT, and GO, indicated
that at least 61.27% (13,654) of the genes displayed homologues in one
database (Table S8) and a total of 9,190 genes could be annotated in all
databases (Fig. 3A). Compared to seven species with available annotated
genomes, no abnormal length distribution of genes, exons, and introns
was observed (Fig. 3B).
A total of 21,572 genes expressed in tissue transcripts were obtained
based on FPKM values >0, accounting for 96.79% of the
total predicted protein-coding genes. When the expression of genes in
muscle was used as a criterion and an FDR value ≤0.005, we obtained
differentially expressed genes from multiple tissues (Fig. 4A). We
focused on the intersection size between tissues, and there were mostly
965 genes shared by the muscle and spleen (2,509/2,232 genes expressed
up/down, Fig. 4B), and the smallest 227 genes shared by muscle and
retina (1,632/1,172 genes expressed up/down, Fig. 4B).
Transposons (RNA and DNA types) and simple sequence repeats (SSRs) were
identified in the C. undulatus genome. We found 540.85 Mb of the
repeat sequences, which accounted for 46.07% of the genome, and
transposons accounted for 39.88% of the genome (Table 2). A total of
711 ncRNAs, 111 rRNAs, and 2,618 tRNAs were annotated in the C.
undulatus genome (Table S9). The divergence rates of the transposons
were mostly lower than 30% (Fig. 5A), suggesting recent activity and a
burst in the genome. In contrast, ray-finned fishes display the highest
diversity, such as the zebrafish, which displays 27 transposon super
families (Sotero-Caio et al . 2017). Transposon activity and
diversity are associated with the evolutionary history of species.
Zebrafish originated about 230 Mya (Tine et al. 2014), whereasC. undulatus diverged from a common ancestor with Cheilines
around 50 Mya (Cowman et al. 2009). In comparison with ten
ray-finned fish genomes with annotated transposons, such as zebrafish
(Howe et al. 2013), spotted sea bass (Shao et al. 2018),Takifugu rubripes (Aparicio et al. 2002), corkwing
wrasse (Mattingsdal et al. 2018), Nile tilapia (Brawand et
al. 2014), the orange clownfish (Lehmann et al. 2018), S.
anshuiensis (Yang et al. 2016), flatfish (Chen et al.2014), and mudskipper (You et al. 2014),
we found that transposon content
contributed to genome size, with larger genomes exhibiting richer
transposon content (Fig. 5B). Transposon content is highly present in
the genome of C. undulatus , suggesting importantly roles of
transposon in genomic evolutions.