Raw data pre-processing and genome size estimation
Quality assessment of the raw DNA Illumina sequence data was performed with FastQC v0.11.8 (Andrews et al., 2010). Low quality reads and adapters were removed using Trimmomatic v0.39 (Bolger et al., 2014). The reads were scanned by a 4-based sliding window with an average cutting threshold lower than 15 Phred score. Leading and trailing bases with quality scores less than 10 were also filtered out. Reads with total length shorter than 75 bp and average score below 30 were omitted. The same process was applied to the RNASeq reads.
Adapter trimming and length filtering of basecalled ONT data was done using Porechop v0.2.4 (https://github.com/rrwick/Porechop) with default parameters and the option – discard_middle to discard reads with internal adapters.
The genome size was estimated using the k-mer histogram method with Kmergenie v1.7051 (Chikhi and Medvedev 2014) from the Illumina genomic sequencing data.
De novo genome assembly
To build the genome assembly the long ONT reads were used for the construction of an initial de novo assembly, and then the Illumina reads were used for the polishing stages. (Figure 1). To construct the initial assembly, we used the v. Flye v2.6 (Kolmogorov et al. 2019) algorithm, a repeat graph assembler. The assembly was evaluated by assessing: (1) the N50 sizes of contigs, using QUAST v5.0.2 (Gurevich et al. 2013), and (2) a gene completeness score using BUSCO v3.1.0 (Simão et al. 2015) against the Actinopterygii ortholog dataset v9, with default parameters.
The produced assembly was polished with two rounds of Racon v1.4.3 (Vaser et al. 2017), using the prepossessed long reads mapped against the assembly with Minimap2 v2.17 (Li 2018). Further polishing was performed with Medaka v0.9.2 (https://github.com/nanoporetech/medaka) and the final polishing was completed using Pilon v1.23 (Walker et al. 2014) after mapping the Illumina reads against the partially polished assembly with Minimap2 v2.17.