Scaffolding the assembly with HiRise
A schematic for the assembly process is shown in Fig. 1. We used the pre-existing male (NCBI BioProject: PRJNA162621) and female (NCBI BioProject: PRJNA179493) genome assemblies (Keeling et al., 2013c) as the input de novo assemblies for scaffolding, along with their underlying shotgun reads (NCBI SRA: SRR546181 and SRR546185 for male assembly, and SRR546193 for female assembly).
Separately for each sex, the de novo assembly, shotgun reads, Chicago library reads, and Dovetail Hi-C library reads were used as input data for HiRise, Dovetail’s proprietary software pipeline designed specifically for using proximity ligation data to scaffold genome assemblies (Putnam et al., 2016). First, shotgun and Chicago library sequences were aligned to the draft input assembly using the SNAP v1.0beta read mapper (Zaharia et al., 2011). Separations of Chicago read pairs mapped within draft scaffolds were analyzed by HiRise to produce a likelihood model for genomic distance between read pairs; the model was used to identify and break putative mis-joins, to score prospective joins, and make joins. Second, Hi-C library sequences were aligned and scaffolded following the same method. Third, shotgun sequences were used to close gaps between contigs.