2.5 Phylogenetic analysis and divergence time estimation

We used OrthoMCL (L. Li, Stoeckert, & Roos, 2003) to identify the gene families (orthologous and paralogous groups) in S. tetraptera and the other eleven species: O. sativa , C. gigantea, V. vinifera, C. canephora, C. roseus, Gelsemium sempervirens (Franke et al., 2019), Gardenia jasminoides (Xu et al., 2020), Eucommia ulmoides (Wuyun et al., 2018), Capsicum annuum ,Aquilegia coerulea (Filiault et al., 2018), andCamellia sinensis (Xia et al., 2020). A total of 485 single-copy gene groups were identified and extracted. For each gene, the protein sequences were aligned by MAFFT v7.467 (Katoh & Standley, 2013), and then the coding sequences (CDS sequences) were aligned by PAL2NAL v.14 (Suyama, Torrents, & Bork, 2006) under the guidance of corresponding protein alignments. For all CDS alignments, the conserved sites were extracted to generate the concatenated sequences for each species. Finally, the phylogenetic tree was constructed by IQ-TREE (Nguyen, Schmidt, Von Haeseler, & Minh, 2015) v1.6.9, with the best-fitted substitution model produced by ModelFinder (Kalyaanamoorthy, Minh, Wong, Von Haeseler, & Jermiin, 2017) and 1,000 replicates (-bb 1000 -m MFP).
MCMCTREE in the PAML v4.9 package (Z. Yang, 2007) was employed to date the divergence times. Two fossil constraints and a soft-bound maximum were used at the split node of (1) monocots-eudicots (130-190 million years ago [Ma]) (H. T. Li et al., 2019); (2) asterids-rosids (116–126 Ma) (H. T. Li et al., 2019); and (3) C. annuum - C. canephora (85-91 Ma) (Hedges, Marin, Suleski, Paymer, & Kumar, 2015). Finally, we used CAFÉ (De Bie, Cristianini, Demuth, & Hahn, 2006) to explore the expanded and contracted gene families.