Scaffolding the assembly with HiRise
A schematic for the assembly process is shown in Fig. 1. We used the
pre-existing male (NCBI BioProject: PRJNA162621) and female (NCBI
BioProject: PRJNA179493) genome assemblies (Keeling et al., 2013c) as
the input de novo assemblies for scaffolding, along with their
underlying shotgun reads (NCBI SRA: SRR546181 and SRR546185 for male
assembly, and SRR546193 for female assembly).
Separately for each sex, the de novo assembly, shotgun reads,
Chicago library reads, and Dovetail Hi-C library reads were used as
input data for HiRise, Dovetail’s proprietary software pipeline designed
specifically for using proximity ligation data to scaffold genome
assemblies (Putnam et al., 2016). First, shotgun and Chicago library
sequences were aligned to the draft input assembly using the SNAP
v1.0beta read mapper (Zaharia et al., 2011). Separations of Chicago read
pairs mapped within draft scaffolds were analyzed by HiRise to produce a
likelihood model for genomic distance between read pairs; the model was
used to identify and break putative mis-joins, to score prospective
joins, and make joins. Second, Hi-C library sequences were aligned and
scaffolded following the same method. Third, shotgun sequences were used
to close gaps between contigs.