Structural variants in Brassica genomes
Structural variation (SV) is generally defined as genomic alterations that are 50bp or larger in size, typically including insertions (INSs), deletions (DELs), duplications (DUPs), inversions (INVs) and translocations (TRAs). SVs greatly impact the genes encoded in the genome and are responsible for diverse agronomically important phenotypes/traits. Compared to single nucleotide polymorphism (SNP) and short insertions and deletions (InDels), SVs are less commonly explored due to the difficulty in fully identifying them with short reads.De novo genome assemblies, especially with high contiguity, can facilitate in-depth genome-wide identification of all forms of structural variations. To the best of our knowledge, no work so far has been conducted to identify SVs based on high-contiguous genome assemblies in Brassica genomes. To close this knowledge gap and have a first glimpse of SVs differing within Brassica rapagenomes, we identified SVs using the genomes of B. rapaZ1(Belser et al., 2018) andB. rapa var. parachinensis (this study), each with genome assembly contig N50, 5.51 Mb and 7.26 Mb, respectively. As shown in Fig. 5A, these two genomes are different only in a single translocation and do not exist in large chromosomal rearrangements. Using the whole genome alignment approach, we identified a total of 27,190 insertions, 26,002 deletions, 1,374 duplications in parachinensis assembly, 1,368 duplications in Z1 assembly, and 46 medium-sized inversions with sizes ranging from 5.2Kb to 1,431.6 Kb, and 8,565 complex SVs with imprecise breakpoints between Z1 and parachinensis (Fig. 7A). Of the insertion events, 845 and 847 are found to be newly occurred LTR insertions specifically in parachinensis and Z1 assembly, respectively, which are consistent with their relatively recent estimated insertion times (Fig. 7B). A large proportion of insertions and deletions detected was found to overlap with the gene regions based on the gene annotation. In Fig. 7C, two cases of local tandem duplication are shown to overlap with gene fragments or full genes. Additionally, comparative genomic analysis can also provide insights into the mutational mechanisms of structural variations. Of the 46 inversions identified, we found that repeat sequences, especially inverted repeat sequence features prevail at the flanking regions, highlighting the causal role of sequence features on small-size inversion formation (Fig. 7D). Taken together, our analysis of genomic structural variations based on these highly contiguous genome assemblies provide the first glimpse of SVs in the Brassic a genomes and their functional significance on gene structure and thus the potential effect on phenotype.