Results and Discussion

Genome assemblies

In this work, we report two de novo genome sequences for the genus Gossypium: a new and corrected assembly for G. raimondii (D5) and a new reference-quality assembly for the closely related G. tuneri (D10). These new genomes integrate multiple sequencing technologies to provide a more accurate representation of each cotton genome. The de novo G. raimondii genome, a tour de force in 2012, was the first reference-quality genome reference for cotton, constructed using Sanger sequence of shotgun clones and BAC-ends (Paterson et al. 2012). It was the first chromosome-scale genome sequence of the Gossypium genus because small genome size and its close sequence similarity to the genomes within the tetraploid nucleus of domesticated cotton (Chen et al. Plant Physiol. 2007 Dec; 145(4): 1303–1310). The genome has been widely used by the cotton research community, as evidenced by its ~500 citations in other journal articles. Despite its revolutionary importance to cotton research, the originally published sequence assembly was not without minor assembly errors. Our genome sequence reported here represents an entirely independent effort to provide an improved D5 genome sequence using PacBio long read sequencing technology and to correct some errors in the genome sequence that have been subsequently discovered.  
The G. raimondii genome was assembled from 43.7x PacBio coverage of raw sequence reads. The assembly consisted of 187 contigs with an N50 of 6.3MB (Table 1). The contigs were was scaffolded using HiC by PhaseGenomics and the pseudomolecules were manually adjusted using JuiceBox (Durand 2016). The final scaffolded assembly was independently verified using a composite optical map of two different enzymes. A comparison of assembly metrics between the previous genome sequence and our new genome sequence of D5 illustrates a 45x improvement in contig length and a 97x reduction to the number of gaps. The cumulative gap length of the new assembly (17.6 kb) was reduced by 647x compared to the assembled gaps of the previous genome sequence (11,391 kb). The final genome assembly size was 14.9 MB smaller that the previous assembly, representing 98% of previously assembled genome sequence in length.
This is the first de novo genome sequence for G. turneri. The D10 genome was assembled from 73.2x PacBio of raw sequence reads. The assembly consisted of 220 contigs with an N50 of 7.9MB (Table 1). Similar to the D5 sequence, these contigs were scaffolded by Dovetail Genomics and the pseudomolecules were manually adjusted using JuiceBox. Bionano data was not collected for G. turneri. The G. raimondii Bionano data was uninformative when aligned to the G. turneri genome sequence (because the distances between labeled recognition sites was too different). After creation of the sequence assembly, the D5 HiC were also mapped to the D10 genome sequence (and vice versa, data not shown). While the amount of mapped reads was reduced significantly (29.90% and 12.67%, respectively), there were no additional association anamolies detected between genomes. /lss/research/jfw-lab/Projects/D5_D10_genomes/d10-reads-vs-d5-genome/
The assembled genome sequences were also verified by alignments to the DT-genome of G. hirsutum (Wang et al. 2018) and the previous genome assembly of D5 (Figure 1). The chromosomes had very good general agreement in their alignments between the four independently assembled sequences (previous D5, currect D5, D10, and Dt). Such colinearity between genomes was also previously identified between cotton genomes. For example, genetic maps of G. hirsutum (e.g. Byers et al. 2012) were used to previously verify and sometimes establish proper scaffolding between contigs (Paterson et al. 2012).