Genome Assembly, Annotation, Structural Comparison and Synteny
The draft long-read CANU assembly resulted in a genome size of 189 Mb, comprised of 2,098 contigs. However, a proportion of the genome was identified as non-target sequence (Figure S 2, Table S3and Table S4 ), principally as Proteobacteria and Tenericutes, and removal of these reads as well as mitochondrial reads reduced the long-read dataset by 44.6 Mb. After non-target sequence removal, a hybrid assembly was generated using both the long-read and short-read data. Different long-read coverage thresholds were explored (10x, 20x, and 25x), with genome completeness assessed using BUSCO. The 20x coverage assembly performed best according to BUSCO, with 95.0% completeness (complete and single copy = 94.8%, complete and duplicated = 0.2%, fragmented = 2.2%, and missing = 2.8%, out of 2,442 single copy orthologs), and resulted in 2,137 contigs summing to 147.3 Mb in total size (average length 69.0 Kb and max length 1.28 Mb). Alternative haslr assemblies resulted in a reduction of BUSCO completeness (93.6% for 10x and 94.8% for 25x), with increases in the missing single copy orthologs. The 20x coverage genome was then scaffolded using RNAseq pair-end read data, which resulted in 1,636 scaffolds and 147.4 Mb genome size (average scaffold length 90.1 Kb and max length 1.37 Mb). This final genome assembly had an improved BUSCO score with 95.3% completeness (complete and single copy = 95.1%, complete and duplicated = 0.2%, fragmented = 2.1%, and missing = 2.6%).
The genome was then annotated using automated prediction, database searches and RNAseq data. This resulted in 17,895 genes and 23,973 proteins (Table 2 ), as well as 150,583 repeat regions (comprising 201,059 copies in 19 families, Table S5 ). The official gene set has a 93.3% BUSCO completeness score (complete and single copy = 80.5%, complete and duplicated = 12.8%, fragmented = 2.4%, and missing = 4.3%). The genome size of N. riversi is smaller and less fragmented than the 11 Coleoptera species in RefSeq (Table 1; Table S6; Figure S3 ), yet the number of genes falls within the range of published gene sets. Similarly, the repeat content of N. riversi is falls within the observed range of other Coleoptera species, albeit at the low end (Table 1 ). On exceptional difference involves the distribution of intron length inN. riversi , which is truncated compared to other beetle species in having a statistically significant reduction of larger introns (> 1,000 bp) and consequently smaller total size of intronic regions (Figure S4 ). Finally, an analysis of collinear genes across N. riversi and six other beetle genomes (Figure S5 ) shows that estimates of the number of collinear genes and the proportion of collinear genes are impacted by genome fragmentation. Focusing only on comparisons involving T. castaneum (the most contiguous assembly), modest synteny is found across Coleoptera, with roughly 2000 genes (6% of the total) showing collinearity in N. riversi .