3.1 Chromosomal-level de novo genome of M. dirhodum
A total of 41.17 Gb of high-quality paired-end reads were obtained by Illumina genomic sequencing (~92.22X coverage, Table S1). The genome size of M. dirhodum was estimated to be 457.2 M based on k -mer counting. The k -mer distribution analysis revealed a peak at 79.8× of the sequencing depth, suggesting a moderate level of heterozygosity (0.445%) and highly repetitive sequence content (59.20%) in the genome (Fig. 1A). To obtain a reference genome forM. dirhodum , we generated 161.53 Gb of PacBio long reads using the CCS model (Table S1), which were subsequently corrected to 10.34 Gb HiFi reads. The genome was initially assembled using hifiasm, resulting in 296 contigs with a contig N50 of 7.82 Mb and the longest contig of 23.64 Mb (Table 1). A total of 41.17 Gb of short reads generated by the Illumina NovaSeq 6000 platform was then mapped against our assembly, resulting in a mapping rate of 92.18%. BUSCO analysis showed that 96.9% (single-copied genes: 92.5%; duplicated genes: 4.4%) of 1,367 single-copy genes in the insecta_odb9 database were identified as complete, 0.4% of genes were fragmented, and 2.7% of genes were missing in the assembled genome.
For the chromosome-level assembly, 38.09 Gb of clean reads (150 bp paired-end) were obtained from the Hi-C library (coverage: 85.31X, Table S1). In total, 118,367,396 (86.83%) reads were mapped to the draft genome, and 96,331,684 (70.67%) of them were uniquely mapped. The uniquely mapped sequences were analyzed with 3D-DNA software to assist genomic assembly. As a result, 68 scaffolds were assembled with an N50 length of 37.54 Mb (Table 1). Finally, 447.8 Mb genomic sequences were located on 9 chromosomes, accounting for 98.50% of the whole assembled length (Fig. 1B, Table 1, Table S2). The contig N50 and scaffold N50 ofM. dirhodum were much higher than those of previously reported aphid genome assemblies (Table 1).