Genome assembly, polishing, and draft quality checks
Whole genome DNA was obtained from two male Tawny owls, where DNA extraction (from nucleated red blood cells) library preparation and whole-genome sequencing was outsourced to BGI. Sequencing consisted of PacBio’s circular library construction with ends linked to SMRT adapters. Read polishing at this stage included removal of SMRT adapters and clustering of redundant subreads sequenced from the same circular molecule into single reads of insert (ROI). Genome assembly was performed with flye (Kolmogorov, Yuan, Lin, & Pevzner, 2019).Flye uses a repeat graph as a core data structure as opposed to the most commonly utilized De Bruijn graphs in short-read and hybrid assemblies. Repeat graphs do not require exact k-mer matches as those are built with approximate sequence matches – to tolerate high noise of single-molecule sequencing reads such as PacBio. Flyemajor parameters were set to default overlap of 5000 base pairs (bp) between reads, while enforcing a minimum reduced coverage for initial disjointing assembly of 20x – reads with 20x or more were utilized to initiate the process. In order to explore how enforcing overlaps change the assembly quality, we performed one assembly with forced minimum overlap to 1000bp between reads. Lastly, we replicated each assembly to check consistency of the algorithm and variance of assembly statistics. Despite flye having a built-in polishing step, we further utilized PacBio´s polishing pipeline gcpp and pbmm2(https://github.com/PacificBiosciences/pbbioconda). All assemblies were compared with quast (Gurevich, Saveliev, Vyahhi, & Tesler, 2013) where we chose the most contiguous, complete and with higher coverage as a future reference genome. Taxa specific completeness of the chosen draft assembly was verified with busco utilizing aves_odb as database of coding regions while also utilizing the northern spotted owl (Strix occidentalis caurina ), burrow owl (Athene cunicularia ) and barn owl (Tyto alba ) assemblies as a term of comparison (Simão, Waterhouse, Ioannidis, Kriventseva, & Zdobnov, 2015). Repetitive elements were identified and masked with RepeatMasker version 4.1.2-p1 and utilizing the HMM-Dfam_3.3 database updated in November 2020 (Chen, 2004). Genome versions utilized in this analysis can be consulted in the supplemental information document.