Methods & Materials
Plant material and sequencing
Leaf tissue of mature G. raimondii (accession D5-4) and G. turneri (accession D10-3) plants was collected at the Brigham Young University (BYU) Greenhouse and DNA was extracted using CTAB techniques (Kidwell et al. 1992). DNA concentration was measured by a Qubit Fluorometer (ThermoFisher, Inc.). The sequencing library was constructed according to PacBio recommendations at the BYU DNA Sequencing Center (DNASC). Fragments >18 kb were selected for sequencing via BluePippen (Sage Science, LLC). Prior to sequencing, the size distribution of fragments in the libraries was evaluated using a Fragment Analyzer (Advanced Analytical Technologies, Inc). Eight and eleven PacBio cells were sequenced from a single library each for G. raimondii and G. turneri, respectively, on the Pacific Biosciences Sequel system. For both genomes, the raw PacBio sequencing reads were assembled using Canu V1.6 using default parameters (Koren, 2017).
Hi-C libraries were constructed from G. raimondii leaf tissue at NorthEast Normal University, China. Sequencing was performed at Annoroad Gene Technology Co., Ltd (Beijing, China). The Hi-C data of G. raimondii was mapped to the previous genome sequence of G. raimondii using ___, .and to the newly assembled CANU contigs of PacBio reads by PhaseGenomics. The Hi-C interactions were used as evidence for contig proximity and in scaffolding contig sequences. An initial draft genome sequence of pseudochromosomes (PGA assembly) was created using a custom script from PhaseGenomics.
DNA was also extracted from young
G. raimondii leaves following the Bionano Plant protocol for high-molecular weight DNA. DNA was purified, nicked, labeled, and repaired according to Bionano standard operating procedures for the Irys platform. Two optical maps of different enzymes (
BspQI and
BssSI) were assembled using the IrysSolve pipeline on the BYU Fulton SuperComputing cluster (
http://fsl.byu.edu). The optical maps were combined into a two-enzyme composite optical map and it was aligned to the PGA assembly using an
in silico labeled reference sequence. Conflicts between the Bionano maps and the PGA assembly were manually identified in the Bionano Access software by comparing the mapped Bionano contigs to the CANU contigs along the draft genome sequence. Conflicts between datasets were resolved by repositioning and reorienting CANU contigs in PGA ordering files followed by reconstruction of the fasta sequence, provided there was supporting or no-conflict evidence from the optical map (Durand 2016,
Supp. Figure 1). Multiple iterations of mapping, conflict resolution, and draft sequence construction resulted in the final, new genome sequence of
G. raimondii.