3.2 Genome Annotation
Repeatmasker and Repbase were used to annotate the repeat sequences. In
total, 34.97% of the M. dirhodum genome was annotated as repeat
sequences. Long terminal repeats (LTRs), long interspersed nuclear
elements (LINEs) and DNA transposons accounted for 9.23%, 2.25% and
10.33% of the whole genome, respectively, and 13.16% of repeat
sequences were annotated as unclassified (Table S5). A total of 286
tRNAs were predicted by trnascan-SE. Using infernal, we also identified
51 small nucleolar RNAs (snoRNAs), 586 ribosomal RNAs (rRNAs), 73 small
nuclear RNAs (snRNAs), 59 microRNAs (miRNAs), 286 tRNAs and 639 other
types of ncRNAs.
After masking repeat sequences, 18,003 protein-coding genes with a mean
CDS length of 1,776 bp were identified from the M. dirhodumgenome using de novo, homology- and RNA sequencing-based methods. The
number of genes in the M. dirhodum genome is comparable to that
in other insect species (Table 1). Functional annotation found that
16,548 (91.92%), 9,030 (50.16%), and 12,836 (71.30%) genes had
significant hits with proteins cataloged in NR, SwissProt and eggNOG,
respectively. There were 9,260 (51.44%) and 6,254 (34.74%) genes
annotated to GO terms and KEGG pathways, respectively (Fig. S1).