2.3 Genome assembly and quality control

We used NextDenovo v2.4.0 (https://github.com/Nextomics/NextDenovo) tode novo assemble the genome with ONT long reads (100×). First, the NextCorrect module was applied to correct the raw reads, then the preliminary genome assembly was generated by the NextGraph module. Purge Haplotigs (Roach et al., 2018) were used to identify and remove the candidate duplicate haplotypes to manually curate the heterozygous assemblies. Racon (Vaser, Sović, Nagarajan, & Šikić, 2017) v1.4.20 was then employed to polish the assembly for two rounds with the corrected ONT long reads (Figure S2 and S3). Finally, we used Nextpolish (J. Hu, Fan, Sun, & Liu, 2020) v1.3.1 for two rounds of assembly polishing based on Illumina short reads (100×) and then we generated the final genome assembly.
We anchored the genome assembly to the chromosome level using the Hi-C data. HiC-Pro (Servant et al., 2015) was employed to control the raw data with default parameters. Bowtie2 (Langmead & Salzberg, 2012) was used to map the Hi-C reads to the assembled genome. The unique mapped reads were extracted, with duplicates excluded, by HiC-Pro. Finally, we used LACHESIS (Burton et al., 2013) to cluster, reorder, and orientate the corrected contigs onto pseudo-chromosomes based on the interaction level.
To assess the quality of our assembly, whole-genome sequencing (NGS) reads and assembled transcripts were mapped to the genome by BWA (H. Li, 2013) v0.7.17 and HISAT2 (D. Kim, Langmead, & Salzberg, 2015) v2.1.0, respectively. Benchmarking Universal Single-Copy Orthologs (BUSCO) (Simão, Waterhouse, Ioannidis, Kriventseva, & Zdobnov, 2015) was also employed to assess the completeness of the assembly based on the dataset of embryophyta_odb10.