Abstract：Sophora tonkinensis (shandougen) is a woody leguminous plant
widely known for its medicinal values in China. The genome of various
legumes utilized as reference genetic maps for pseudomolecule assembly
have been published. However, the genome of Sophora has not been mapped.
In this study, we reported a chromosomal scale draft genome of S.
tonkinensis assembled using PacBio single-molecule real-time sequencing
reads and Hi-C technique. A high-quality draft S. tonkinensis genome of
899Mb in size was obtained, which was larger than those of some other
leguminous genome, and the BUSCO analysis reviewed 95.9% completeness
of the genome. We annotated 78.3% of the genome as repeat elements and
transposable elements occupied 73%. A total of 36,410 protein-coding
genes were identified in the S. tonkinensis genome. The comparative
analysis on genome size and repetitive sequences of S. tonkinensis and
four other legumes (Lupinus albus, Lupinus angustifolius, Glycyrrhiza
uralensis and Medicago truncatula) revealed that the transposable
elements (TEs) in S. tonkinensis were inserted after the whole genome
duplication and after differentiation with other legumes. It can be
speculated that the size of the S. tonkinensis genome may be related to
the repetitive sequence insertion. We also analyzed matrine and
flavonoids which are important compounds in S. tonkinensis. We further
analyzed lignin and Nitrogen-fixing gene which plays an important role
in the adaptation of S. tonkinensis to the environment. In conclusion,
the high-quality genome of S. tonkinensis obtained in this study laid
the foundation for genetic and molecular biology studies of legumes.