2.3 ∣ Gene annotation
We used a homology-based method to predict the protein-coding gene
structure. Firstly, we built a non-redundant protein database of WWS and
other sepcies. Then protein sequences were aligned to the genome by
using TBlastN with an E-value cutoff by 1E-5. For each blast hit,
genewise was used to predict the
exact gene structure in the corresponding genomic regions. Finally,
RNA-seq data were mapped to genome using Tophat (version 2.0.8). Then
cufflinks (version 2.1.1) (http://cufflinks. cbcb.umd.edu/) was used to
assemble transcripts to gene models. We used the transcript information
to revise the gene set.
Functional annotation of protein-coding genes was evaluated by BLASTP
(E-value:1E-05). Protein domains were annotated by searching InterPro
(V32.0) and Pfam (V27.0) databases, using InterProScan (V4.8) and Hmmer
(V3.1) respectively. Gene Ontology (GO) terms for each gene were
obtained from the corresponding InterPro or Pfam entry. The pathways
were assigned by blast against the KEGG database, with an E-value cutoff
of 1E-05. The tRNA genes were identified by tRNAscan-SE software. The
rRNA fragments were predicted by aligning to the rRNA sequences database
using BlastN at E-value of 1E-10. The miRNA and snRNA genes were
predicted by INFERNAL software against the Rfam database.