2.1 Study species
The ancestrally heterostylous
Primula vulgaris is a perennial, rosette-forming diploid (2n = 22) blooming in early spring (February–April) with pale-yellow fused corollas reaching up to 40 mm in width formed by broad, overlapping, v-notched lobes (Richards, 2003).
Primula vulgaris has dimorphic pollen grains and stigma papillae (Mast, Kelso, & Conti, 2006). Pollen grains are small in L-individuals but large in S-individuals, whereas stigma papillae are long in L-individuals but short in S-individuals (Richards, 2003). The species reproduces mainly sexually; clonal reproduction, uncommon in the wild, has been reported under cultivation (Boyd, Silvertown, & Tucker, 1990). Pollination in natural populations of
P. vulgaris is mostly performed by long-tongued bees, bumblebees, and bee-flies, but butterflies, pollen-gathering bees and moths have been occasionally reported (Boyd et al., 1990; Keller et al., 2020; Woodell, 1960). Seeds, produced in capsules, are dispersed by ants and rodents (Valverde & Silvertown, 1995).
Primula vulgaris has a patchy distribution with populations ranging from dozens to hundreds of individuals mostly found in woodlands and hedgerows and less frequently in grasslands (Jacquemyn, Endels, Brys, Hermy, & Woodell, 2009). Previous studies indicated that long-homostyles of
P. vulgaris are self-compatible and produce seeds in the absence of pollinators (Piper et al., 1986). Analyses of the mating system of homostyles estimated a range of outcrossing rates from 0.05 to 0.8 (Bodmer, 1958; Crosby, 1959; Piper, Charlesworth, & Charlesworth, 1984). It has been suggested that reduced pollinator service due to intensive pastoral and agricultural activities could be the selective pressure favoring the increase of self-fertilizing homostyles in Somerset, England (Piper et al., 1986). Recently, long-homostyles have been reported in one population of an agriculturally fragmented area in the Netherlands (Barmentlo, Meirmans, Luijten, Triest, & Oostermeijer, 2017).
2.2 Frequency variation of floral morphs in natural populations of Primula vulgaris
To test whether homostyles increase concomitantly with the decrease of S-individuals, while L-individuals are maintained at low frequencies, we quantified floral morph composition in 22 natural populations of the Somerset region (England) in the spring of 2019 (Figure 2A). The approximate location of these populations was based on Curtis and Curtis (1985). The frequency of floral morphs was estimated by randomly scoring 100 individuals whenever possible (Table 1). Flowering plants were visually assigned to S-, L- and H-morphs (Figure 1). To determine whether S- and L-morphs segregated at equal frequencies in both dimorphic heterostylous (i.e., populations consisting of S- and L-individuals) and trimorphic populations (i.e., populations consisting of S-, L- and H-individuals), we tested whether morph ratios deviated significantly from isoplethy (i.e., equal frequencies of S- and L-morphs) using G-tests of goodness-of-fit as implemented in the R package ‘DescTools’ v0.99.38 (Signorell et al., 2020). We estimated a ternary plot using the R package ‘ggtern’ v3.1.0 (Hamilton & Ferry, 2018) to assess whether the increase in the frequency of H-morphs occurred at the cost of only one or both S- and L-morphs.
2.3 Sampling for genetic analyses
In the spring of 2019, we randomly collected leaf tissue from 11 to 30 individuals in each of the 22 natural populations described above for a total of 613 individuals. Leaf tissue was dried in silica gel. We sampled the same number of each floral morph where possible; sampled individuals were at least 1–2 meters apart from each other. We additionally collected leaf tissue and 2–3 capsules with seeds from nine individuals (four pins and five thrums) of one dimorphic heterostylous population (D07) and seven individuals from the fully homostylous population (M01). Collected seeds from each individual were sowed in a single pot; seedlings from a single individual formed a family. All pots were placed in water containing gibberellic acid (70 ppm) for 24 hours and later placed in growth chambers at 18 °C with a 12:12 h light-dark period. The proportion of germinated seeds per family was recorded after eight weeks, and leaf tissue from a total of ten seedlings per family was collected whenever possible and dried in silica gel. DNA of individuals from natural populations and progeny arrays was extracted with a modified CTAB protocol (Doyle & Doyle, 1987) and used for subsequent genetic analyses.
2.4 Molecular basis of the transition to homostyly
To determine whether the transition to homostyly is associated with a single or multiple molecular changes in CYPᵀ and which ones, we sequenced CYPᵀ from multiple S-individuals and homostyles of P. vulgaris from different populations.
Data generation - We Sanger-sequenced all five exons of
CYPᵀ from 38 individuals (27 H-individuals from 10 populations and 11 S-individuals from five populations, corresponding to 2–3 individuals per population). Additionally, we obtained
CYPᵀ sequences
from whole genome resequencing data of six S-individuals from two populations (three individuals per population; EMC, unpublished data). Finally, we complemented our data set with previously published sequences from two H-individuals (
CYPᵀ SLH1 and
SLH2, respectively, of Li et al.,
2016). In total, we thus analyzed
CYPᵀ sequences from 46 individuals (29 H- and 17 S-morphs).
For Sanger sequencing
, we used previously designed primers based on all five
CYPᵀ exons (Huu et al., 2016). Reverse primers for exons 1, 2, and 3 were newly designed to obtain longer sequences (Supplementary material Table S1). Detailed PCR conditions for the amplification of all
CYPᵀ exons are included in Supplementary Material. Sanger sequencing was performed in an ABI Prism 3130 genetic analyzer (Applied Biosystems). Forward and reverse sequences were visually inspected and aligned with MUSCLE, as implemented in MEGA X (Kumar, Stecher, Li, Knyaz, & Tamura, 2018). All exon sequences were concatenated with Mesquite v3.61 (Maddison & Maddison, 2019). Library preparation and high throughput sequencing for the whole genome resequencing data were performed by RAPiD GENOMICS (Gainesville, Florida, USA) using paired-end sequencing (~150bp sequence reads) in NovaSeq 6000 (Illumina). We used HybPiper v1.3.1 (Johnson et al., 2016) with default parameters (except for the coverage-cutoff level for assemblies set to 4) to target and extract the sequence of
CYPᵀ from whole genome resequencing data. To identify synonymous and nonsynonymous mutations, we used the Open Reading Frame (ORF) from
P. vulgaris CYPᵀ sequence deposited in GenBank (KT257665.1; Li et al., 2016).
Analyses – To determine the extent of
CYPᵀ variation in S-individuals and homostyles, we estimated nucleotide diversity at synonymous (πS) and nonsynonymous sites (πNS) using DNAsp v6.12.03 (Rozas et al., 2017). To determine the relationships among
CYPᵀ sequences from H- and S-individuals, we first estimated a haplotype network with the R package ‘pegas’ v0.12 (Paradis, 2010). Such analysis is appropriate because
CYPᵀ is located in the hemizygous, non-recombining S-locus. In addition to single nucleotide substitutions, we scored each insertion or deletion (i.e., indel) as a character following the “simple indel contig” guidelines by Simmons and Ochoterena (2000). Briefly, each indel of any length was scored as a presence/absence character in every individual. The resulting presence/absence matrix was coded
as pseudo-nucleotides using ‘A’ as absence, ‘T’ as presence and ‘-’ as unknown, and was concatenated to the sequence alignment of
CYPᵀ. Secondly, we estimated the
CYPᵀ phylogeny using a partitioned Maximum Likelihood (ML) analysis in RAxML v8.01 (Kozlov, Darriba, Flouri, Morel, & Stamatakis, 2019) with a GTR-GAMMA substitution model for nucleotide substitutions and a binary (BIN) model for indels. Branch support was estimated with 1000 standard bootstrap re-samplings. A publicly available
CYPᵀ sequence from
P. veris (KX589238; Huu et al., 2016) was used as an outgroup to root the
CYPᵀ tree.
2.5 Evolutionary consequences of transitions to homostyly
We tested whether outcrossing rates from progeny arrays are higher in heterostyles than homostyles with microsatellites. We estimated the strength of inbreeding depression at the seed-germination stage. We tested whether increasing frequencies of homostyles are associated with higher population-level selfing rates, inbreeding coefficients within populations, and genetic differentiation among populations.
2.5.1 Outcrossing rates from progeny arrays
Outcrossing rates were estimated from nine heterostylous families (stemming from the dimorphic heterostylous population D07) with a mean number of 10 seedlings per family and seven homostylous families (stemming from the fully homostylous population M01) with 7.86 ± 1.06 (mean ± SE) seedlings per family. Both maternal and seedling plants were genotyped with the same microsatellites as in section 2.5.2. Multi-locus (
tm) and single-locus (
ts) outcrossing rates and their associated standard errors were estimated with MLTR v.3.4 (Ritland, 2002), using 10000 bootstrap resamplings of maternal families. We used a one-sided Wilcoxon signed-rank test to assess whether homostylous individuals have a lower outcrossing rate than distylous individuals. Moreover, the germination rate of heterostylous and homostylous families was used as a fitness proxy to estimate the strength of inbreeding depression (δ) as 1 – (
WHO /
WHE) (Goodwillie, Kalisz, & Eckert, 2005) where
WHO and
WHE are the mean germination rates of homostyles and heterostyles, respectively.
2.5.2 Population-level estimates of selfing rates
Data generation - We genotyped all 613 individuals collected from the 22 natural populations using 12 microsatellites previously developed for
P. vulgaris (Triest, Sierens, & Van Rossum, 2015). Briefly, the loci were amplified in two multiplex reactions with a QIAGEN Multiplex PCR kit. Detailed conditions for the amplification of microsatellites are included in Supplementary Material. Fragment size estimations were performed using an ABI Prism 3130xl genetic analyzer (Applied Biosystems) with GeneScan 500 LIZ (Applied Biosystems) as internal standard. Alleles were scored using GeneMapper v4.1 (Applied Biosystems). The presence of null alleles was estimated with Micro-checker v2.2.3 (Van Oosterhout, Hutchinson, Wills, & Shipley, 2004).
Analyses – Population-level selfing rates were inferred using model-based Bayesian estimation (BES v0.1.3; Redelings et al., 2015) from microsatellite data of 613 individuals from 22 natural populations (see section 2.3). The main advantage of this method compared to estimates generated by other programs such as RMES (David, Pujol, Viard, & Goudet, 2007) is that BES makes use of all loci regardless of whether they are homozygous or heterozygous. Selfing rates and 95% CIs were inferred using 10000 bootstrap iterations. To test whether higher frequencies of homostyles in natural populations are associated with higher selfing rates, we used generalized additive models where the frequency of homostyles was used as an independent variable and by specifying a beta distribution for the estimates of selfing rates as implemented in the R package ‘gamlss’
v5.1.6 (Rigby, Stasinopoulos, & Lane, 2005).
2.5.3 Population genetic analyses
Allelic richness (
Ar), observed (
Ho) and expected heterozygosity (
He), and inbreeding coefficient (
Fis) were calculated with the R package ‘diveRsity’ v9.90 (Keenan, Mcginnity, Cross, Crozier, & Prodöhl, 2013) from microsatellite data of 613 individuals from 22 populations (see section 2.3). Each locus and population were tested for Hardy-Weinberg equilibrium using the R package ‘genepop’ v1.1.7 (Rousset, 2008). To assess whether an increase in the frequency of homostyly is associated with an increase in
Fis, we used a standard linear model with homostyle frequency as the independent variable. Given that population size can influence
Fis, we included census size as covariable with no interaction among independent variables.
Pairwise genetic differentiation measured as
Fst (Weir & Cockerham, 1984) and its normalized counterpart
G’st (Meirmans & Hedrick, 2011) were estimated with the R package ‘diveRsity’ v9.90 (Keenan et al., 2013). To discover whether higher homostyle frequencies increase genetic differentiation, we tested whether pairwise differences in homostyle frequencies between populations are positively correlated with pairwise
G’st values. To test the significance of association between these two matrices, we used a Mantel test as implemented in the R package ‘adegenet’ v2.1.3 (Jombart, 2008).
2.6 Genetic structure and gene flow
To estimate genetic structure among
P. vulgaris populations, we used InStruct v3.2.09 (Gao, Williamson, & Bustamante, 2007), which extends the Bayesian model implemented in Structure (Pritchard, Stephens, & Donnelly, 2000) to situations of partial self-fertilization and recurrent inbreeding. We ran the analyses with 100000 burn-in iterations and 20 independent MCMC chains per run from 2 to 22
k, as suggested by Gilbert et al. (2012). The most likely number of clusters (
k) was detected with the
Δk method (Evanno, Regnaut, & Goudet, 2005) implemented in CLUMPAK (Kopelman, Mayzel, Jakobsson, Rosenberg, & Mayrose, 2015). Additionally, a Discriminant Analysis of Principal Components (DAPC) was performed with the R package ‘adegenet’ v2.1.1 (Jombart, 2008). To determine whether there is Isolation By Distance (
IBD), we performed a Mantel test implemented in ‘adegenet’
using both
Fst and
G’st values of genetic differentiation.
Finally, to determine whether patterns of genetic structure are explained by gene flow, we estimated the size-corrected effective number of migrants per generation (
Nm) using the frequency of private alleles with the R package ‘genepop’ v1.1.7 (Rousset, 2008). Moreover, we used Approximate Bayesian Computations (ABC), as implemented in popABC v1.0 (Lopes, Balding, & Beaumont, 2009), to compare models of genetic divergence with and without gene flow using the coalescent framework developed by Nielsen and Wakely (2001) and Hey and Nielsen (2004). Specifically, we compared a scenario of isolation without gene flow (i.e., proportion of migrants per generation,
m = 0) with seven scenarios of increasing migration rates (
m = 0.001, 0.01, 0.1, 0.2, 0.3, 0.4 and 0.5). Priors used for this analysis are specified in Supplementary Material (Table S2). Each model was simulated with 100000 iterations and tolerance for the rejection step was set to 0.01. Model selection was based on the posterior probability estimated by categorical regression (Beaumont, Zhang, & Balding, 2002) using custom R scripts included in the popABC documentation. A major assumption of the models implemented in popABC is that there is no genetic structure within populations. Given that H-individuals may cause within-population structure due to selfing, we decided to use only the ten dimorphic heterostylous populations for popABC analyses because no genetic substructure is expected in populations consisting entirely of outcrossing individuals. No differences in pollen and seed dispersal are known for H-, S- and L-individuals, thus excluding populations with H-individuals from popABC analyses should not affect estimates of migration between populations.