2.3 ∣ Gene annotation
We used a homology-based method to predict the protein-coding gene structure. Firstly, we built a non-redundant protein database of WWS and other sepcies. Then protein sequences were aligned to the genome by using TBlastN with an E-value cutoff by 1E-5. For each blast hit, genewise was used to predict the exact gene structure in the corresponding genomic regions. Finally, RNA-seq data were mapped to genome using Tophat (version 2.0.8). Then cufflinks (version 2.1.1) (http://cufflinks. cbcb.umd.edu/) was used to assemble transcripts to gene models. We used the transcript information to revise the gene set.
Functional annotation of protein-coding genes was evaluated by BLASTP (E-value:1E-05). Protein domains were annotated by searching InterPro (V32.0) and Pfam (V27.0) databases, using InterProScan (V4.8) and Hmmer (V3.1) respectively. Gene Ontology (GO) terms for each gene were obtained from the corresponding InterPro or Pfam entry. The pathways were assigned by blast against the KEGG database, with an E-value cutoff of 1E-05. The tRNA genes were identified by tRNAscan-SE software. The rRNA fragments were predicted by aligning to the rRNA sequences database using BlastN at E-value of 1E-10. The miRNA and snRNA genes were predicted by INFERNAL software against the Rfam database.