2.3 Genome sequencing and assembly
Illumine sequencing was performed to evaluate genome size, heterozygosity and rate of duplication and polish de novo assembly. A paired-end library (insert size: 350 bp) was constructed on Illumina NovaSeq platform. The raw data generated were filtered by FASTAQC. After filtering, we yielded a total of 112.81 Gb clean data with 176× sequence coverage.
High-quality genome DNA was fragmented to construct SMRT bell library with PCR-free. After the library size was tested to be qualified by Qubit 3.0 and Agilent 2100, it was sequenced on SMRT cell by PacBio Sequel Ⅱ sequencing platform (Pacific Biosciences) with ×186.17 Mean Depth. we obtained a total of 169.37 Gb clean data after filtering and 7,960,820 subreads (mean subreads length: 21,275.65 bp, subreads length N50: 31,540 bp). Row data generated from PacBio sequencing were corrected by CANU. In the assembly phase, reads were assembled into contig and output consensus sequences by WTDBG v2 with default parameters. PBMM2 (MINIMAP2) was used to map original data to the reference genome, and ARROW (RACON) for polishing. The previously polished FASTA sequence was indexed with BWA index, and the corrected genome was used as the reference genome. Then, the Illumina sequencing FASTQ data were compared with BWA MEM to perform Pilon error correction for secondary polishing. To remove the redundancy of the genome after preliminary assembly and error correction, PURGE_HAPLOTIGS software was used to identify and remove the redundant heterozygous contigs according to the depth distribution of reads and sequence similarity. The quality of genome sequence was evaluated by BUSCO v4 with default parameters(Manni et al. , 2021).