Phylogenomic
and syntenic data demonstrate complex evolutionary processes in early
radiation of the rosids
Luxian Liu1,2†,
Mengzhen Chen1†, Ryan A. Folk3†,
Meizhen Wang2, Tao Zhao4, Fude
Shang1,5 Douglas E. Soltis6,7, and
Pan Li2*
1Laboratory of Plant Germplasm and Genetic
Engineering, School of Life Sciences, Henan University, Kaifeng, Henan,
475001, China
2Key Laboratory of Biosystems Homeostasis and
Protection (Zhejiang University), Ministry of Education, Hangzhou,
Zhejiang, 310058, China
3Department of Biological Sciences, Mississippi State
University, Starkville, MS, United States
4State Key Laboratory of Crop Stress Biology for Arid
Areas/Shaanxi Key Laboratory of Apple, College of Horticulture,
Northwest A&F University, Yangling, Shaanxi, 712100, China
5Henan Engineering Research Center for Osmanthus
Germplasm Innovation and Resource Utilization, Henan Agricultural
University, Zhengzhou, Henan, 450002, China
6Florida Museum of Natural History, University of
Florida, Gainesville, FL, 32611 United States
7Department of Biology, University of Florida,
Gainesville, FL, 32611 United States
†These authors contributed equally to this work
*Corresponding author:
Pan Li (Email: panli_zju@126.com, Phone: +8613757152017)
Abstract
Some of the most vexing problems of deep-level relationships in
angiosperms involve superrosids. The superrosid clade contains a quarter
of all angiosperm species, with 18 orders in three subclades (Vitales,
Saxifragales, and core rosids) exhibiting remarkable morphological and
ecological diversity. To help resolve deep-level relationships, we
constructed a high-quality chromosome-level genome assembly forTiarella polyphylla (Saxifragaceae), thereby providing a broader
genomic representation of Saxifragales. Whole genome microarray analysis
of superrosids showed that Saxifragales shared more synteny clusters
with core rosids than Vitales, further supporting Saxifragales as being
more closely related to core
rosids. To resolve the ordinal
phylogeny of superrosids, we screened 122 single-copy nuclear genes from
the genomes of 36 species representing all 18 superrosid
orders. Vitales were recovered as
sisters to all other superrosids (Saxifragales + core rosids). Our data
suggest dramatic differences in these relationships compared to earlier
studies of core rosids. Fabids should be restricted to the
nitrogen-fixing clade, while Picramniales, the Celastrales-Malpighiales
(CM) clade, Huerteales, Oxalidales, Sapindales, Malvales, and
Brassicales formed an “expanded” malvid clade. The
Celastrales-Oxalidales-Malpighiales (COM) clade
(sensu APG IV) was not monophyletic.
Crossosomatales, Geraniales, Myrtales, and Zygophyllales did not belong
to either our well-supported malvids or fabids.
There is a strong discordance between nuclear and plastid phylogenetic
hypotheses for superrosid relationships, which can be best explained by
a combination of incomplete lineage sorting and ancient reticulation.
.
Key words: genome assembly, Tiarella polyphylla , Angiosperm-mega
353, phylogeny, superrosids, ancient reticulation.
Introduction
The core eudicots consist of Gunnerales, Dilleniales, superrosids, and
superasterids, with the latter two containing the vast majority of
flowering plant diversity (Drinnan et al., 1994; Soltis et al.,
2018 ). Superrosids, comprising core rosids (eurosids), Saxifragales,
and Vitales, contain more than 90,000 species and thus represent more
than a quarter of all angiosperms (Wang et al., 2009; Sun et
al., 2020 ). Superrosid species exhibit remarkable morphological and
ecological diversity and include herbs, shrubs, trees, vines, aquatics,
succulents, and parasites (Zhao et al., 2016 );
Many important crops, as well as
forest trees, are superrosids (Wang et al., 2009 ) including
Rosales (e.g., apple, jujube, and mulberry), Vitales (grape),
Cucurbitales (watermelon, cucumber), Fabales (peanut, soybean), Fagales
(walnut, waxberry, oak), and Brassicales (radish, mustard, and cabbage).
Several superrosid orders, such as Malvales, Myrtales, Cucurbitales,
Fabales, Rosales, and Saxifragales, exhibit
exceptionally high diversification
rates among angiosperms (Magallon & Sanderson, 2001; Folk et
al., 2019; Sun et al., 2021 ). The enormous diversity and ecological and
economic importance of superrosid species highlights the importance of
greater resolution in superrosid phylogeny.
The monophyly of superrosids has been recovered repeatedly in previous
studies, with both organellar (Moore et al., 2010; Sun et al.,
2015; Li et al., 2019a ) and nuclear genes (Zhang et al., 2012;
One Thousand Plant Transcriptomes Initiative, 2019; Sun et al., 2021 ),
as well as combined datasets (Wang et al., 2009; Sun et al.,
2020 ). However, relationships within superrosids have proven more
problematic. In APG IV (2016) ,
Saxifragales were sister to
Vitales plus core rosids, a topology found in multiple phylogenetic
studies of mostly plastid genes (e.g., Wang et al., 2009; Soltis
et al., 2011; Li et al., 2019a ). The core rosid clade, in turn,
consisted of fabid and malvid subclades. The fabids contained the COM
clade (Celastrales, Oxalidales, and Malpighiales), nitrogen-fixing clade
(Fabales, Rosales, Cucurbitales, and Fagales), and Zygophyllales, which
include Geraniales, Myrtales, Crossosomatales, Picramniales, Sapindales,
Huerteales, Malvales, and Brassicales.
Although superrosids have long been the focus of phylogenetic research
(Wang et al., 2009; Soltis et al., 2011; Zhang et al., 2012; Li
et al., 2019a; Sun et al., 2020 ), relationships remain problematic, in
part because of rapid radiation (Wang et al., 2009) combined
with substantial recent evidence of incongruence between nuclear and
plastid topologies (Zhang et al., 2012; Li et al., 2019a; Sun et
al., 2020 ). Key problems in our understanding of relationships in
superrosids remain:1) Are Saxifragales or Vitales the sister lineage of
core rosids? 2) What are the major subclades within core rosids, and
what orders should be included in fabids vs. malvids? 3) What are the
relationships between COM clade members, and are they actually
monophyletic? An improved nuclear-based phylogeny of superrosids and
core rosids would help provide a better understanding of the
evolutionary history of this enormous clade.
Previous phylogenetic studies of superrosids were primarily based on
plastid and mitochondrial genes or relied on a small number of nuclear
genes (Wang et al., 2009; Moore et al., 2010; Zhang et al.,
2012; Sun et al., 2016; Li et al., 2019a; Sun et al., 2020 ), with a
recent exception that includes numerous nuclear genes derived from
transcriptomes (One Thousand Plant Transcriptomes Initiative,
2019 ). Organellar genomes
(mitochondrial genomes and plastomes) are generally inherited
uniparentally, and the mitochondrial genome is slowly evolving and
sometimes affected by horizontal gene transfer, which introduces biases
and errors in phylogenetic reconstruction (Birky, 2001; Davis et
al., 2014 ); likewise The plastome is frequently transferred
horizontally through introgression (Okuyama et al., 2005;
Stegemann et al., 2012 ). In contrast, nuclear genes are inherited
biparentally and show higher substitution rates than organellar genes,
thereby overcoming many of these issues (Springer et al., 2001;
Davis et al., 2014 ). In particular, low- or single-copy nuclear genes
provide a crucial line of evidence for resolving angiosperm phylogeny
(Zeng et al., 2014; Zhang et al., 2020 ), and the importance of
using these genes for phylogenetic reconstruction has long been
recognized (Strand et al., 1997; Duarte et al., 2010; Zhang et
al., 2012 ). Therefore, the use of a sufficient number of single- or
low-copy nuclear genes coupled with broad taxon sampling is a promising
approach to elucidate angiosperm phylogeny (Duarte et al., 2010;
Soltis et al., 2018; One Thousand Plant Transcriptomes Initiative,
2019 ). In green plants, however, identifying orthologous loci has
proven difficult because of frequent whole-genome duplication events,
especially in angiosperms (Blanc & Wolfe, 2004; Barker et al.,
2009 ). The increasing availability of genomic resources held in public
repositories and the availability of many newly developed bioinformatic
pipelines to identify low- or single-copy genes have enabled bait kit
design for orthologous genes from a wide range of flowering plant groups
(Campana, 2018; Vatanparast et al., 2018; McLay et al., 2021 ).
Universal bait kits, such as Angiosperms353 loci used in this study, aim
to capture the same set of loci from samples representing significant
phylogenetic breadth and evolutionary timescales (Bossert& Danforth, 2018;
Johnson et al., 2019; Breinholt et al., 2021 ). Currently, the
Angiosperms353 probe set has been widely used to study the relationships
between different groups (Maurin et al., 2021; Thomas et al.,
2021; Zuntini et al., 2021; Acha & Majure, 2022 ).
Increasing amounts of genomic data have been sequentially applied to
resolve rapid radiation in both green plant (Carlsen et al.,
2018; Rouard et al., 2018 ) and animal (Malinsky et al., 2018;
Jensen et al., 2021 ) lineages. Much of this work has used large numbers
of coding regions extracted from genomes; however, chromosome-level
genomes offer an additional path to assessing phylogenetic relationships
via microsynteny, which is particularly valuable for resolving
recalcitrant phylogenetic nodes (Zhao et al., 2021 ). A number
of available genome assemblies have been published for Vitales
(Massonnet et al., 2020; Minio et al., 2022 ), as well as for
diverse families and orders of the core rosids (Wang et al.,
2021b; Wang et al., 2022a ), Rosales (Jiao et al.,
2020; Cao et al., 2022 ), but few high-quality genomic resources have
been obtained for Saxifragales, preventing the use of this information
to resolve phylogeny or understand genome evolution in the earliest
radiation of the superrosids. Although small, Saxifragales are an
ancient and morphologically diverse group (Jian et al., 2008;
Soltis et al., 2018 ) with early and
rapid radiation (~89.5 to 110 Ma) that has made
resolving phylogenetic relationships challenging (Fishbein et
al., 2001; Wang et al., 2009; Jian et al., 2008; Dong et al., 2018; Folk
et al., 2019 ). For the 15 families of Saxifragales, seven whole-genome
assemblies from four families are available: Paeonia ostii T.
Hong and J. X. Zhang (Yuan et al., 2022 ), Paeoniasuffruticosa Andrews (Paeoniaceae, Lv et al., 2020 ),Hamamelis virginiana L. (Hamamelidaceae, Korgaonkar et
al., 2021 ), Cercidiphyllum japonicum Siebold et Zucc.
(Cercidiphyllaceae, Zhu et al., 2020 ), and three Crassulaceae
species (Kalanchoe fedtschenkoi Raym.-Hamet et H. Perrier,Yang et al., 2017 ; Rhodiola crenulata (Hook. f. et
Thoms.) H. Ohba, Fu et al., 2017 ; Sedum album L.,Wai et al., 2019 ). However,
of these assembled genomes, only C. japonicum and P. ostiiare assembled at the chromosomal level. To improve the genome resources
for Saxifragales and provide genome-scale data needed for our analyses
of relationships, we produced a chromosome-level genome assembly forTiarella polyphylla D. Don (Saxifragaceae) (Fig. 1-A ).
This species has a wide distribution (Wu & Raven, 2003 ); it is
an ideal model for use in future biogeographic studies as well as to
investigate the features of Saxifragaceae (e.g., it is used in
traditional medicine; Lee et al., 2012; Kim et al.,
2021 ).
In this study we: (1) use gene sequence data for numerous nuclear loci
representing all orders of superrosids to resolve relationships and
evolutionary history; (2) constructed a high-quality chromosomal
assembly reference genome for T. polyphylla to help elucidate
evolutionary history; and (3) combined our newly generated complete
genome and published complete nuclear genome sequences to conduct
microsynteny analyses of superrosids to further resolve relationships.