3.5 Gene annotation
A combination of ab initio prediction, homology search, and
transcript mapping were used to predict the protein-coding genes in the
Chinese walnut genome. RNA from eighteen tissues was used to predict
gene models (Table S1). Predicted protein-coding genes (27,901) had an
average gene length of 5,735 bp, an average coding sequence (CDS) length
of 1,226 bp, and an average of 6 exons per gene (Table 1). When we
compared Chinese walnut to Arabidopsis based on genome structural
features, we found the distribution of CDS lengths exon lengths ofJ. cathayensis was similar to A. thaliana ; however, the
distribution of mRNA lengths and intron lengths of J. cathayensiswas unlike A. thaliana (Table 1; Figure S3). Among 27,901
predicted genes, 96.1 % could be functionally annotated in at least one
of these seven databases (Table S8). There were 2,014 genes annotated in
Nr database only, 23 genes annotated in InterPro only, 6 genes annotated
in KEGG only, and no gene was annotated in swissProt or COG only (Figure
S4). The GC density with an average length of 900 bp and an average GC
content of 51.21% (Figure 3b). Gene density throughout the genome was
about 11 genes per 100 kb, with 56,553 genes (94.96 %) present on
chromosomally anchored contigs (Figure 3c); this was equivalent to 307
transcripts per 1Mb of chromosome
(Figure 3d). There are 82 syntenic
blocks in the Chinese walnut genome (Figure 3e). The portion of the
Chinese walnut genome comprised of non-coding RNA was small; it included
miRNA, tRNA, rRNA, and snRNA (Table S9). A total of 581 tRNA (Table S9),
792 small nuclear RNA (snRNA) and 132 microRNA (miRNA) were identified
(Table S9).