2.5 Molecular basis of the transition to homostyly
Data generation - To determine if homostylous phenotypes are
associated with mutations in CYPᵀ, we Sanger-sequenced all five
exons of CYPᵀ from 38 individuals (27 HO-individuals from 10
populations and 11 SS-individuals from five populations, corresponding
to 2–3 individuals per population). Additionally, we obtained theCYPᵀ sequences from whole genome sequencing data of six
SS-individuals from two populations (three individuals per population;
EMC, unpublished data). Finally, we complemented our data set with
previously published sequences
from two HO-individuals (Li et al., 2016). In total, we thus analyzedCYPᵀ sequences from 46 individuals (29 HO- and 17 SS-morphs).
For Sanger sequencing, we used previously designed primers of all
five exons of CYPᵀ (Huu et al., 2016). Reverse primers for exons
1, 2 and 3 were newly designed to obtain longer sequences (Supplementary
material Table S1). Detailed PCR conditions for the amplification of allCYPᵀ exons are included in Supplementary Material. Sanger
sequencing was performed in an ABI Prism 3130 genetic analyzer (Applied
Biosystems). Forward and reverse sequences were visually inspected and
aligned with MUSCLE, as implemented in MEGA X
(Kumar,
Stecher, Li, Knyaz, & Tamura, 2018). All exon sequences were
concatenated with Mesquite v3.61 (Maddison & Maddison, 2019). Library
preparation and high throughput sequencing for the whole genome sequence
data was performed by RAPiD GENOMICS (Gainsville, Florida, USA) using
paired-end sequencing (~150bp sequence reads) in NovaSeq
6000 (Illumina). We used HybPiper v1.3.1 (Johnson et al. 2016) with
default parameters (except for the coverage-cutoff level for assemblies
set to 4) to target and extract the sequence of CYPᵀ from whole
genome sequencing data. To identify synonymous and nonsynonymous
mutations we used the Open Reading Frame (ORF) from P. vulgarisCYPᵀ sequence deposited in GenBank (KT257665.1; Li et al., 2016).
Analyses - To determine the relationships among CYPᵀsequences from HO- and SS-individuals, we first estimated a haplotype
network with the R package ‘pegas’
v0.12
(Paradis, 2010). In addition to single nucleotide substitutions, we
also scored each insertion or deletion (i.e., indel) as a
character following the “simple indel contig” guidelines by
Simmons
and Ochoterena (2000). Briefly, each indel of any length was scored as
a presence/absence character in every individual. The resulting
presence/absence matrix was coded as pseudo-nucleotides using ‘A’ as
absence, ‘T’ as presence and ‘-’ as unknown, and was concatenated to the
sequence alignment of CYPᵀ . Secondly, we estimated theCYPᵀ phylogeny using a partitioned Maximum Likelihood (ML)
analysis in RAxML v8.01 (Kozlov, Darriba, Flouri, Morel, & Stamatakis,
2019) with a GTR-GAMMA substitution model for nucleotide substitutions
and a binary (BIN) model for indels. Branch support was estimated with
1000 standard bootstrap
re-samplings.
A publicly available CYPᵀ sequence from P. veris(KX589238; Huu et al., 2016) was used as outgroup to root theCYPᵀ tree.