3.1 ∣ Genomic sequences
We sequenced the genome of a female WWS using a whole-genome shotgun strategy. Cleaned-up reads provided 107-fold average coverage across 88.23 Gb of assembled sequence (Supplementary Fig. 3 and Supplementary Tables 1 and 2), with N50 values of 411.09 kb and 1.25 Mb respectively, for contigs and scaffolds in the final assembly (generated using SOAPdenovo4; Supplementary Figs. 4-5 and Supplementary Tables 3-5). We identified a total of 14,020 protein-coding genes (82.7% of which were functionally classified; Figure 1a, Supplementary Figs. 6-7 and Supplementary Tables 6-8). The WWS assembly was further refined by using high throughput chromosome conformation capture (Hi-C) data, comprising 9 scaffolds with scaffold N50 of 69.68Mb (Table 1). As a result, 98.16% contigs assembly of 638.30 Mb were distributed across 9 chromosome-level pseudomolecules (Fig. 2a, b; Table 1 and Supplementary Table 9).
When we compared WWS with other insect species, we found 4615 gene families in common and 497 gene families unique to WWS (Figure 1a). Using these protein-coding genes, we generated a timescale for insect evolution (Figure 1b). Comparative analyses were carried out on gene clusters among WWS and other representative insect species along with theDaphnia pulex (water flea, a crustacean) as a non-insect outgroup (Figure 1b, Supplementary Table 10). A total of 24,923 gene families were identified among these species with 553 clusters as common singlets (Supplementary Fig. 7). Mobile elements comprised about 37.3% repeat content of the WWS genome and the percent of long terminal repeats (LTR) was 28.5%, much higher than other transposons (DNA, LINE and SINE) (Supplementary Fig. 8 and Supplementary Tables 11-12). In addition, we identified noncoding RNA (ncRNA) genes, including 357 miRNA, 188 tRNA, 42 rRNA, and 46 snRNA genes (Supplementary Table 13).