Comparative analyses
In order to calculate the phylogenetic signal of the Crabtree effect, a categorical trait (presence/absence) we calculated the minimum number of transitions in character states, at each node of the phylogeny, which accounts for the observed distribution of the character in the tips (Maddison & Maddison, 2000, Paleo-Lopez et al., 2016). Then, this magnitude was compared with the median of a randomized distribution of the character assignment (1,000 randomizations were used). This is a statistical analysis to test if phylogenetic signal departs from zero in categorical traits: a significant phylogenetic signal is inferred when the observed transition rates fall within the lower tail of 5% of the randomized distribution. Being significant, this outcome implies that the innovation (i.e., Crabtree positive yeasts) appeared at some point in a given lineage, and affected the derived lineages. If it is not significant, it is concluded that Crabtree positive species arose randomly across the phylogeny. We also computed phylogenetic signal for continuous traits using the K-Blomberg statistic. This index vary from zero to infinite, being K=1 the expectation under a model of Brownian Motion evolution (Blomberg et al., 2003). To identify adaptive shifts on fermentative traits, we applied an algorithm that is based on the Ornstein-Uhlenbeck process (OU). This approach was originally proposed by Hansen (1996), who modeled the OU process as a statistical formalization of the “common descendent” assumption of evolution and its deviations (see Fig 1 in Hansen & Martins, 1996). Here we explain the OU model, briefly.
The rate of change of mean trait values of a lineage is given by:
dX(t) = α[θ-X(t)]dt + σdB(t) (1)
This equation expresses the infinitesimal change rate in change in trait X over an infinitesimal increment of time. The term dB(t) is “white noise”, a random variable that is normally distributed with mean 0 and variance dt, and σ represents the intensity of these random fluctuations. The deterministic part of the model is given by the term α[θ-X(t)]dt, in which α represents the magnitude by which selection “pulls” lineages to a phenotypic optimum, represented by θ. With α=0, this model collapses to:
dX(t) = σdB(t) (2)
the Brownian Motion model for trait evolution (Felsenstein, 1973, Felsenstein, 1985). This model uses the basic assumption of comparative studies as a null hypothesis for any pair of lineages; that the phenotypic similarities between both is proportional to the time passed since the last common ancestor (Felsenstein, 1973).
We applied the OU model, combined with an algorithm of automatic detection of adaptive shifts in the phylogeny, the “lasso-OU” algorithm, implemented in the R package l1ou (Khabbazian et al., 2016). This procedure simply assumes that at least one shift exists at the beginning of any given branch, and tests the validity of this shift as explanatory of the whole dataset using information criteria. The algorithm is implemented as linear model (see Khabbazian et al., 2016; ec. 1), and incorporates the lasso procedure for estimating the models (Tibshirani, 1996). We used Bayesian information criteria (BIC, Wagenmakers & Farrell, 2004) to rank models assuming either a fixed shift, by default located where the WGD is described (i.e., at the common ancestor of the VanderwaltozymaSaccharomycesclade, see Fig 1a); or models where shifts are searched automatically by the algorithm. The program permits to set the maximum number of shifts allowed, which in our case was set as three shifts. This analysis was performed for the four metric traits we considered here: ethanol yield, respiratory quotient, glycerol production and growth rate.