Sequencing and assembling Macquarie perch and golden perch genomes
For Macquarie perch, we used fin, tail bone and muscle tissues of a
2-month-old hatchery-produced juvenile of unknown sex of Yarra River
origin, born in November 2012 (sample ID MP_SCH12). For the golden
perch genome, we used fin tissue collected non-lethally from adult of
unknown sex (aged as 3+ years based on size), captured in the Broken
River, Victoria, in May 2017 (sample ID GOP001). For both species, DNA
samples were preserved in ethanol and kept at -20°C.
DNA was extracted using Qiagen DNeasy Blood & Tissue kits. For
short-read sequencing, 100 ng of gDNA was fragmented to 350 bp using
QSonica and processed with a NEB Ultra Illumina Library Preparation Kit.
The libraries were pooled with libraries for other projects and
sequenced on all four lanes of S4 flowcell of a Novaseq6000 at the
Deakin Genomics Centre using 2 × 151 bp run configuration, with the aim
of obtaining 100 Gb of data per sample (Appendix A). To obtain long-read
data, 1 µg of gDNA was fragmented to 8 kb using a Covaris G-Tube and
processed with a LSK108 library preparation kit according to the
manufacturer’s instructions (Oxford Nanopore, UK). The library was
subsequently sequenced on a Nanopore R9.4 flowcell. Base-calling of the
Nanopore signal used Albacore v2.0.1 (Oxford Nanopore, UK).
Illumina reads, adapter-trimmed using fastp v0.19.5 (Chen, Zhou, Chen,
& Gu, 2018), and Nanopore long reads were hybrid-assembled de
novo using MaSuRCA v3.2.4 (Zimin et al., 2017). The short Illumina
reads were first error-corrected with QuORUM as implemented in the
MaSuRCA pipeline and subsequently used to construct contigs by the de
Bruijn graph approach. These contigs were used to error-correct the
Nanopore long reads, generating “mega read” contigs for
Overlap-Layout-Consensus assembly. Genome completeness was assessed
using BUSCO v4 (Seppey, Manni, & Zdobnov, 2019) with default setting,
based on the actinopterygii_odb10 database.