Regression procedures to predict ploidy occurrence
Boosted regression trees (BRTs), which are largely unaffected by the
distribution of the data (De’ath, 2007), were used to predict the
percentage of occurrence of each of the three ploidy states for
macrophyte species present per gridcell, using 16 variables (LAT, ET0,
AI, TYR, TMX, TRG, TDRY, PCP, PCPDR, PCPS, CCV, grAH, ALT, CROP,
Stot and Send). The tree analysis was
conducted following the guidelines of Elith et al. (2008). All
regression trees overfit the model and it is necessary to apply some
cost-complexity based simplification of the initial tree by iteratively
dropping each variable and assessing the loss in predictive power. The
predictors that are retained in the simplified model can therefore be
thought of as significant predictors. Tree complexity was set at three
with a learning rate of 0.0005, and with the bag fraction set at 0.75
and only results from the simplified trees are presented. Partial
dependence plots of fitted diversity function versus observed values for
variables significantly predicting the response variables were prepared;
these seek to present the influence uniquely attributable to a single
predictor. To support interpretation of the outcomes two approaches are
used, BRT and simple linear regression biplots. In addition, ‘standard’
regression trees were constructed using the same response and set of
predictor variables as in the BRT analysis. These provide a better
visualisation of relationship of % ploidy status and the key predictor
variables along with key cut off values of these environmental drivers
in the data set. BRT analysis was carried out using gbm package
(Greenwell et al., 2018) and standard regression trees were constructed
using the rpart package (Therneau et al., 2019). These analyses were
performed in R v. 4.0.2 (R Core Team, 2020). Linear regression plots
were constructed using Excel.