3.2 Genome Annotation
Repeatmasker and Repbase were used to annotate the repeat sequences. In total, 34.97% of the M. dirhodum genome was annotated as repeat sequences. Long terminal repeats (LTRs), long interspersed nuclear elements (LINEs) and DNA transposons accounted for 9.23%, 2.25% and 10.33% of the whole genome, respectively, and 13.16% of repeat sequences were annotated as unclassified (Table S5). A total of 286 tRNAs were predicted by trnascan-SE. Using infernal, we also identified 51 small nucleolar RNAs (snoRNAs), 586 ribosomal RNAs (rRNAs), 73 small nuclear RNAs (snRNAs), 59 microRNAs (miRNAs), 286 tRNAs and 639 other types of ncRNAs.
After masking repeat sequences, 18,003 protein-coding genes with a mean CDS length of 1,776 bp were identified from the M. dirhodumgenome using de novo, homology- and RNA sequencing-based methods. The number of genes in the M. dirhodum genome is comparable to that in other insect species (Table 1). Functional annotation found that 16,548 (91.92%), 9,030 (50.16%), and 12,836 (71.30%) genes had significant hits with proteins cataloged in NR, SwissProt and eggNOG, respectively. There were 9,260 (51.44%) and 6,254 (34.74%) genes annotated to GO terms and KEGG pathways, respectively (Fig. S1).