Regression procedures to predict ploidy occurrence
Boosted regression trees (BRTs), which are largely unaffected by the distribution of the data (De’ath, 2007), were used to predict the percentage of occurrence of each of the three ploidy states for macrophyte species present per gridcell, using 16 variables (LAT, ET0, AI, TYR, TMX, TRG, TDRY, PCP, PCPDR, PCPS, CCV, grAH, ALT, CROP, Stot and Send). The tree analysis was conducted following the guidelines of Elith et al. (2008). All regression trees overfit the model and it is necessary to apply some cost-complexity based simplification of the initial tree by iteratively dropping each variable and assessing the loss in predictive power. The predictors that are retained in the simplified model can therefore be thought of as significant predictors. Tree complexity was set at three with a learning rate of 0.0005, and with the bag fraction set at 0.75 and only results from the simplified trees are presented. Partial dependence plots of fitted diversity function versus observed values for variables significantly predicting the response variables were prepared; these seek to present the influence uniquely attributable to a single predictor. To support interpretation of the outcomes two approaches are used, BRT and simple linear regression biplots. In addition, ‘standard’ regression trees were constructed using the same response and set of predictor variables as in the BRT analysis. These provide a better visualisation of relationship of % ploidy status and the key predictor variables along with key cut off values of these environmental drivers in the data set. BRT analysis was carried out using gbm package (Greenwell et al., 2018) and standard regression trees were constructed using the rpart package (Therneau et al., 2019). These analyses were performed in R v. 4.0.2 (R Core Team, 2020). Linear regression plots were constructed using Excel.