3.1 Chromosomal-level de novo genome of M. dirhodum
A total of 41.17 Gb of high-quality paired-end reads were obtained by
Illumina genomic sequencing (~92.22X coverage, Table
S1). The genome size of M. dirhodum was estimated to be
457.2 M based on k -mer
counting. The k -mer distribution analysis revealed a peak at
79.8× of the sequencing depth, suggesting a moderate level of
heterozygosity (0.445%) and highly repetitive sequence content
(59.20%) in the genome (Fig. 1A).
To obtain a reference genome forM. dirhodum , we generated 161.53 Gb of PacBio long reads using
the CCS model (Table S1), which were subsequently corrected to 10.34 Gb
HiFi reads. The genome was initially assembled using hifiasm, resulting
in 296 contigs with a contig N50
of 7.82 Mb and the longest contig of 23.64 Mb (Table 1).
A total of 41.17 Gb of short reads
generated by the Illumina NovaSeq 6000 platform was then mapped against
our assembly, resulting in a mapping rate of 92.18%. BUSCO analysis
showed that 96.9% (single-copied genes: 92.5%; duplicated genes:
4.4%) of 1,367 single-copy genes in the insecta_odb9 database were
identified as complete, 0.4% of genes were fragmented, and 2.7% of
genes were missing in the assembled genome.
For the chromosome-level assembly,
38.09 Gb of clean reads (150 bp paired-end) were obtained from the Hi-C
library (coverage: 85.31X, Table S1). In total, 118,367,396 (86.83%)
reads were mapped to the draft genome, and 96,331,684 (70.67%) of them
were uniquely mapped. The uniquely mapped sequences were analyzed with
3D-DNA software to assist genomic assembly. As a result,
68 scaffolds were assembled with
an N50 length of 37.54 Mb (Table 1). Finally,
447.8 Mb genomic sequences were
located on 9 chromosomes, accounting for 98.50% of the whole assembled
length (Fig. 1B, Table 1, Table S2). The contig N50 and scaffold N50 ofM. dirhodum were much higher than those of previously reported
aphid genome assemblies (Table 1).