Discussion
Chinese flowering cabbage (B. rapa var. parachinensis ) is an important leafy and bolting stem vegetable with high nutritional value which has been widely grown in Asia(Tan et al., 2019). Among the abundant ecological types of Brassica rapa that are planted as vegetables in China, Chinese flowering cabbage is the one that is well-adapted to the high temperature and high humidity climate in the south of China. It can be planted all year round for tender flower products without the need for a strict vernalization process. In this study, we report the first chromosome-level genome assembly of this important ecological B. rapa strain, Chinese flowering cabbage, which provides a valuable genomic data resource for evolutionary studies for B. rapa and related Brassica species. This present study is the first to report on the genome size, heterozygosity, and repeat content of the Chinese flowering cabbage genome.
Highly continuous genome assembly is critical for genome-wide marker development and gene model prediction. Enormous studies have demonstrated that recent long-read sequencing technologies can greatly improve the continuity of genome assembly(Song et al., 2020; Wang et al., 2019; Belser et al., 2018; Zhang et al., 2018). In this study, we used PacBio long reads to assemble the B. rapa var. parachinensis genome. Because of the low heterozygous ratio (0.16%) of the plants used in this genome sequencing, we obtained the contig N50 length of 7.26 Mb, which is longer than the two B. rapa genomes sequenced recently by PacBio and Nanopore technology(Belser et al., 2018; Zhang et al., 2018), and much longer than the genomes ofB. rapa and B. oleracea sequenced using Illumina technology(Liu et al., 2014; Wang et al., 2019). We applied the Hi-C technique to scaffold more than 545 Mb contigs onto 10 chromosomes. The scaffold N50 length of the final assembly reached 32.3 Mb, with the maximum size of 47.4Mb, which was similar to the B. rapa Z1 genome sequenced with Nanopore technology(Belser et al., 2018) (Table S5). The completeness of the genome (97.8%) was validated using the BUSCO analysis in the present study, and surpassed most of the genome of related Brassica species sequenced thus far, including B. oleraceaHDEM(Belser et al., 2018),B. oleracea var.botrytis (Sun et al., 2019) and B. rapaZ1(Belser et al., 2018) (Table S5).
In the present study, the assembly of the Chinese flowering cabbage genome resolved most of the pericentromeric regions of the B. rapa . Among them, the pericentromeric regions of chromosome 5 (A05) and 6(A06) were found to be significantly expanded in comparison to other pericentromeric regions and very few genes were annotated in this region (Fig. 2B; Fig. 6). This observation can further be verified by the Hi-C contact map in which the pericentromeric regions of chromosome 5 and 6 have a clear sparse Hi-C contact signal that is mostly caused by repetitive sequences (Fig. 3). Strikingly, this expansion seems to be lineage specific since we do not observe a similar pattern in the two other Brassica genome types, i.e. chromosome C05 and C06 inB. oleracea and B. napus (Belser et al., 2018; Song et al., 2020), and chromosome B05 and B06 in B. nigra (Fig. 6A). This lineage specific expansion may play a role in the evolutionary divergence of Brassica AA, BB and CC genomes. It is worth noting that such large repetitive regions can only be resolved by long-read sequencing technology. For example, in the previous studies, B. rapa Z1 and B. napus AA genome assemblies present a similar but relatively weaker pattern than the current assembly(Belser et al., 2018; Song et al., 2020; Zhang et al., 2018) (Fig. S1). However, in the assembly of B. rapa (Belser et al., 2018; Song et al., 2020; Zhang et al., 2018) (Figure S1E), sequenced by PacBio Sequel with a N50 of 1.45Mb, does not present the large repetitive regions in its assembly (Supplementary Fig.1E).
The genus Brassica contains three basic genomes, B. rapa(AA genome), B. nigra (BB genome), and B. oleracea (CC genome), which further hybridize to give rise to three allopolyploid species, B. napus (AACC genome), B. juncea (AABB genome), and B. carinata (BBCC genome)(Cheng et al., 2016; Sun et al., 2019). In the present study, a phylogenetic tree was constructed to analyze the evolution of the Brassica species. Interestingly, the Chinese flowering cabbage shows the closest relationship with the B. juncea AA genome but not with twoB. rapa genomes (Chinese cabbage and yellow sarson)(Fig. 4)(Belser et al., 2018; Zhang et al., 2018). The B. rapa species can be further subdivided into six populations: turnips (Chinese and European turnips), sarsons (sarson, rapid cycling and spring/winter oilseed), turnip rapes, taicai and mixed Japanese morphotypes, pak choi (pak choi, wutacai, Chinese flowering cabbage and zicaitai varieties) and heading Chinese cabbages(Cheng et al., 2016). Our results suggested that the donor of the AA genome inB. juncea is most likely from the pak choi group (Chinese flowering cabbage) in contrast to other B. rapa varieties, such as sarsons and turnips(Belser et al., 2018; Cai et al., 2017). Meanwhile, we found that B. rapa Z1 (sarson) was clustered firstly with B. napus AA genome and then other AA genomes, implying that it should be the most evolutionary closest donor of the AA genome in B. napus . Similarly, theB. oleracea can also be subdivided into seven populations such as kohlrabies, Chinese kale, cauliflower, broccoli, Brussels sprouts, kale and cabbages(Cheng et al., 2016). Interestingly, B. oleracea var. capitata(cabbages) was clustered firstly with two B. napus CC genomes and then with B. oleracea var. italica (broccoli), implying the donor of CC genome in B. napus was probably evolved fromB. oleracea var. capitata (cabbages) (Fig. 4). Thus, we demonstrated that high continuity genome assemblies can aid in the interpretation of evolutionary relationship among Brassicaspecies.
Numerous cases of studies found that structural variations can impact larger genomic regions than SNPs. Structural variant (SV) discovery would not only help our understanding of the landscape of genomic variation within and between species but also reveal the functional significance of SVs(Fuentes et al., 2019). In comparison to SVs detection methods that are based on Illumina short reads, the whole assembly-based method can fully recover the SVs in theory but still depend on assembly quality. SVs studies in human(Audano et al., 2019; Huang et al., 2010), and in a wide range of plant species, such as rice(Fuentes et al., 2019), Maize(Mahmoud et al., 2020), tomato(Voichek & Weigel, 2020), andArabidopsis (Voichek & Weigel, 2020) indicate that SVs can affect a large proportion of coding genes. In current study, we detect SVs between the genome assemblies of two Brassica rapa lines and identified a total of 27,190 insertions, 26,002 deletions, 1,368 duplications and 46 medium-sized inversions with size from 5.2Kb to 1,431.6 Kb, and 8,565 complex SVs with imprecise breakpoints between them (Fig. 7). This is the first report of SVs that detect between Brassica genomes using high contiguity genome assemblies. These SVs may affect coding genes that may further contribute to phenotypic variations, such as morphological and phytochemical characteristics.
In summary, we report a chromosome-level genome assembly of Chinese flowering cabbage and its accurate gene and TE annotation. The phylogenetic analysis indicates this genome has a closer evolutionary relationship with the AA diploid progenitor of B. juncea . We also found the lineage specific pericentromeric expansion events on the chromosome 5 and 6 of the Brassica AA genome compared to the orthologous genomic regions in the Brassica BB and CC genomes. Finally, we report a large amount of structural variations (SVs) between two B. rapa lines (Z1 and parachinensis ) using high continuity genome assemblies. Overall, our high-quality genome assembly of Chinese flowering cabbage provides a valuable genetic resource for deciphering the genome evolution of Brassica species and it would serve as the reference genome guiding the molecular breeding practice ofB. rapa crops.