2.3 Genome survey and assembly
The K-mer distribution was analyzed to estimate the genome size, heterozygosity, and repeat content using Illumina paired-end reads. The K-mer distribution was analyzed using the Jellyfish and GenomeScope tools based on a k value of 17 (Vurture et al., 2017).
PacBio subreads were obtained from the raw polymerase reads after removal of short and low-quality reads and the adaptor sequences, which were then filtered and corrected using the pbccs pipeline with default parameters (https://github.com/PacificBiosciences/ccs). The resulting HiFi reads (high-fidelity reads) were subjected to hifiasm for de novo assembly (https://github.com/chhylp123/hifiasm). BWA v0.7.15 (https://sourceforge.net/projects/bio-bwa/files/) (Li, 2013) and SAMtools v1.4 (https://sourceforge.net/projects/samtools/files/samtools/) (Li et al., 2009) were used for read alignment and SAM/BAM format conversion. Genome assembly and completeness were assessed using the conserved genes in BUSCO v3.0.2 (https://busco.ezlab.org/) (Simao et al., 2015).