2.3 Genome survey and assembly
The K-mer distribution was analyzed to estimate the genome size,
heterozygosity, and repeat content using Illumina paired-end reads. The
K-mer distribution was analyzed using the Jellyfish and GenomeScope
tools based on a k value of 17 (Vurture et al., 2017).
PacBio subreads were obtained from the raw polymerase reads after
removal of short and low-quality reads and the adaptor sequences, which
were then filtered and corrected using the pbccs pipeline with default
parameters
(https://github.com/PacificBiosciences/ccs).
The resulting HiFi reads (high-fidelity reads) were subjected to hifiasm
for de novo assembly
(https://github.com/chhylp123/hifiasm).
BWA v0.7.15
(https://sourceforge.net/projects/bio-bwa/files/)
(Li, 2013) and SAMtools v1.4
(https://sourceforge.net/projects/samtools/files/samtools/)
(Li et al., 2009) were used for read alignment and SAM/BAM format
conversion. Genome assembly and completeness were assessed using the
conserved genes in BUSCO v3.0.2
(https://busco.ezlab.org/)
(Simao et al., 2015).