2.5 Phylogenetic analysis and divergence time
estimation
We used OrthoMCL (L. Li, Stoeckert, & Roos, 2003) to identify the gene
families (orthologous and paralogous groups) in S. tetraptera and
the other eleven species: O. sativa , C. gigantea, V.
vinifera, C. canephora, C. roseus, Gelsemium sempervirens (Franke et
al., 2019), Gardenia jasminoides (Xu et al., 2020),
Eucommia ulmoides (Wuyun et al., 2018), Capsicum annuum ,Aquilegia coerulea (Filiault et al., 2018), andCamellia sinensis (Xia et al., 2020). A total of 485 single-copy
gene groups were identified and extracted. For each gene, the protein
sequences were aligned by MAFFT v7.467 (Katoh & Standley, 2013), and
then the coding sequences (CDS sequences) were aligned by PAL2NAL v.14
(Suyama, Torrents, & Bork, 2006) under the guidance of corresponding
protein alignments. For all CDS alignments, the conserved sites were
extracted to generate the concatenated sequences for each species.
Finally, the phylogenetic tree was constructed by IQ-TREE (Nguyen,
Schmidt, Von Haeseler, & Minh, 2015) v1.6.9, with the best-fitted
substitution model produced by ModelFinder (Kalyaanamoorthy, Minh, Wong,
Von Haeseler, & Jermiin, 2017) and 1,000 replicates (-bb 1000 -m MFP).
MCMCTREE in the PAML v4.9 package (Z. Yang, 2007) was employed to date
the divergence times. Two fossil constraints and a soft-bound maximum
were used at the split node of (1) monocots-eudicots (130-190 million
years ago [Ma]) (H. T. Li et al., 2019); (2) asterids-rosids
(116–126 Ma) (H. T. Li et al., 2019); and (3) C. annuum - C.
canephora (85-91 Ma) (Hedges, Marin, Suleski, Paymer, & Kumar, 2015).
Finally, we used CAFÉ (De Bie, Cristianini, Demuth, & Hahn, 2006) to
explore the expanded and contracted gene families.