3.1 Genome assembly and quality assessment
The genome size of C. undulatus was estimated to be 1.18-1.27 Gb based on 17-mer frequencies (Fig. S1), and the total number of k-mers was approximately 4.2 × 1010 using findGSE (Sunet al. 2018) and 3.9 × 1010 using GenomeScope (Vurture et al. 2017) at the k-mer peak with a depth of 33×. We sequenced approximately 49.9 Gb data via Illumina short-read sequencing, and 90.7 Gb data via nanopore long-read sequencing, indicating 77-fold coverage of the genome (Table S3). The low-quality reads and adapter sequences were filtered from raw genome data from nanopore sequencing of three 20-kb libraries, and we obtained 86.4 Gb clean reads with an N50 length of 31.69 kb for the following genome assembly (Table S4). As a result, a total length of 1164.9 Mb and a contig N50 length of 16.4 Mb were obtained for genome assembly of C. undulatus . The size of the assembled genome was slightly lower than the genome size estimated by 17-mer analysis. The nanopore-assembled genome was polished in two runs. The final draft genome assembly was 1173.4 Mb from 328 contig number, which reached a high level of continuity with a contig N50 length of 16.5 Mb (Table S5), and the whole-genome average GC content was 42%. The genome of this species is larger than the known genomes of other marine fishes, usually ranging from 366 to approximately 900 Mb (Xiao et al. 2019; Xu et al. 2018). We evaluated the quality of the assembled C. undulatus genome against the BUSCO database, and 96.36% of complete BUSCO genes were found in the assembled genome. Meanwhile, the entire genome was covered by more than 98% of Illumina short reads, and the base accuracy of the genome was more than 99.99% (Table 1). Furthermore, the transcriptome of multiple tissues from Illumina RNA-seq showed high map-read rates from 89.74% to 94.98% (Table S6). Therefore, we have provided thorough genome assembly for C. undulatus .