2.5 Molecular basis of the transition to homostyly
Data generation - To determine if homostylous phenotypes are associated with mutations in CYPᵀ, we Sanger-sequenced all five exons of CYPᵀ from 38 individuals (27 HO-individuals from 10 populations and 11 SS-individuals from five populations, corresponding to 2–3 individuals per population). Additionally, we obtained theCYPᵀ sequences from whole genome sequencing data of six SS-individuals from two populations (three individuals per population; EMC, unpublished data). Finally, we complemented our data set with previously published sequences from two HO-individuals (Li et al., 2016). In total, we thus analyzedCYPᵀ sequences from 46 individuals (29 HO- and 17 SS-morphs).
For Sanger sequencing, we used previously designed primers of all five exons of CYPᵀ (Huu et al., 2016). Reverse primers for exons 1, 2 and 3 were newly designed to obtain longer sequences (Supplementary material Table S1). Detailed PCR conditions for the amplification of allCYPᵀ exons are included in Supplementary Material. Sanger sequencing was performed in an ABI Prism 3130 genetic analyzer (Applied Biosystems). Forward and reverse sequences were visually inspected and aligned with MUSCLE, as implemented in MEGA X (Kumar, Stecher, Li, Knyaz, & Tamura, 2018). All exon sequences were concatenated with Mesquite v3.61 (Maddison & Maddison, 2019)⁠⁠. Library preparation and high throughput sequencing for the whole genome sequence data was performed by RAPiD GENOMICS (Gainsville, Florida, USA) using paired-end sequencing (~150bp sequence reads) in NovaSeq 6000 (Illumina). We used HybPiper v1.3.1 (Johnson et al. 2016) with default parameters (except for the coverage-cutoff level for assemblies set to 4) to target and extract the sequence of CYPᵀ from whole genome sequencing data. To identify synonymous and nonsynonymous mutations we used the Open Reading Frame (ORF) from P. vulgarisCYPᵀ sequence deposited in GenBank (KT257665.1; Li et al., 2016).
Analyses - To determine the relationships among CYPᵀsequences from HO- and SS-individuals, we first estimated a haplotype network with the R package ‘pegas’ v0.12 (Paradis, 2010)⁠⁠. In addition to single nucleotide substitutions, we also scored each insertion or deletion (i.e., indel) as a character following the “simple indel contig” guidelines by Simmons and Ochoterena (2000)⁠⁠. Briefly, each indel of any length was scored as a presence/absence character in every individual. The resulting presence/absence matrix was coded as pseudo-nucleotides using ‘A’ as absence, ‘T’ as presence and ‘-’ as unknown, and was concatenated to the sequence alignment of CYPᵀ . Secondly, we estimated theCYPᵀ phylogeny using a partitioned Maximum Likelihood (ML) analysis in RAxML v8.01 (Kozlov, Darriba, Flouri, Morel, & Stamatakis, 2019) with a GTR-GAMMA substitution model for nucleotide substitutions and a binary (BIN) model for indels. Branch support was estimated with 1000 standard bootstrap re-samplings⁠. A publicly available CYPᵀ sequence from P. veris(KX589238; Huu et al., 2016) was used as outgroup to root theCYPᵀ tree.