Introduction
Brassica , which belongs to the Brassicaceae family, is among the most economically important genus, since it contains a wide range of staple vegetables and oilseed crops. Over the course of its evolution, Brassica experienced an additional genome-wide triplication (WGT) event after it splitted with Arabidopsis from a common ancestor(Cheng et al., 2016; Lysak, Koch, Pecinka, & Schubert, 2005). Thus, species in theBrassica genus not only display great morphological and phytochemical diversity but also karyotype diversity(Cheng et al., 2016; Wang et al., 2019). Among the most agriculturally importantBrassica species, there are three diploid genome types includingBrassica rapa (AA), Brassica nigra (BB) and Brassica oleracea (CC), and three allopolyploid species which were generated by the pair combinations of the former three diploid species, includingBrassica napus (AACC), Brassica juncea (AABB) andBrassica carinata (BBCC). These six species and their evolutionary origination and relationship with each other are well defined in a ‘triangle of U’ model(Wang et al., 2019;Yang et al., 2016).
Due to the rapid recent advances in sequencing technology, especially the next-generation sequencing (NGS), a large number of Brassicaspecies have been sequenced, but most are only on a primitive level of quality. These sequenced genomes, for example those sequenced with illumina/Roche 454 technology, including B. rapa var.pekinensis Chiifu (Wang et al., 2011),B. oleracea 02-12(Liu et al., 2014), B. oleraceaTO1000DH(Parkin et al., 2014), B. nigraYZ12151(Yang et al., 2016),B. napus (Bayer et al., 2017; Chalhoub et al., 2014; Sun et al., 2017), and B. juncea (Wang et al., 2019; Yang et al., 2016) had a relatively low continuity which may impede the genomic analysis especially at the complex genomic parts such as pericentromeric and centromeric regions. Only until recently, the application of long-read sequencing technologies, including Oxford Nanopore Technology (ONT) and Pacific Biosciences (PACBIO), to genome assembling has greatly improved continuity of the assembled contigs.There are at least fourBrassica genomes that were reported to be sequenced with long read technology with a resulting contig N50 up to megabase size, including B. oleracea cultivars HDEM, Brassica rapa Z1 (yellow sarson)(Belser et al., 2018), B. oleracea var. botrytis(Sun et al., 2019) and B. napus (Song et al., 2020). These studies demonstrated great success in the assembly of high continuity genome assemblies (i.e. N50>5Mb)(Belser et al., 2018) with long read technology in Brassica genomes. Since the great morphological and phytochemical diversity in theBrassica species, genome information from a wide range of representative Brassica species will be helpful and needed to deeply decipher the genomic variants that may contribute to the great diversity that not only phenotype but also karyotype various cultivars of the species.
The Chinese flowering cabbage (Brassica rapa var.parachinensis ), locally known as Caixin, Tsai Tai, Choy Sum, bok choy, or Tsai Hsin(Tan, Fan, Kuang, Lu, & Reiter, 2019; Xiao et al., 2019), is an important leafy and bolting stem vegetable widely grown in Asia, particularly in China, Japan, and Korea(Kamran et al., 2020). This vegetable has high nutritional value and is rich in vitamins, minerals, secondary metabolites and dietary fiber, which confer human health-promoting effects(Xiao et al., 2019). Unlike other B. rapa vegetables, Chinese flowering cabbage can bolt and flower easily without strict vernalization under low temperature. Therefore, it is very important to conduct this genome sequencing and assembly to further uncover the genomic information and molecular mechanisms involved in the formation of special morphological and phytochemical characteristics of this cultivar.
In this study, we report a high continuity (N50 = 7.2 Mb) and chromosome level genome assembly for Chinese flowering cabbage (Brassica rapa ). It was assembled with an integrated approach using Illumina sequencing, PacBio and high-throughput chromosome conformation capture (Hi-C) technology. The assembly resolved a large part of the pericentromeric regions of this species. In addition, genome comparison and evolutionary analysis of this genome and other representativeBrassica species were conducted. The results provide novel insights into the Brassica genome structure evolution.