Role of diversity and abiotic factors
In an attempt to control extrinsic anthropogenic variables that affect native and non-native plant richness, as a second complementary analytical step, we used a combination of Boosted Regression Tree analysis (BRT; Elith et al. 2008) and Random Forest Analysis (RFA; Breiman 2001). These two analyses address more specifically the extent to which diversity and abiotic factors alone determine E. canadensis abundance variation (Feld et al. 2016a). BRT was used to partition the variation in E. canadensis abundance explained by diversity (hypothesis 1 ) and abiotic descriptors (hypothesis 2 ) alone, and how they might together reflect habitat heterogeneity (hypothesis 3 ) at the landscape scale (Feld et al. 2016b). BRT constitutes a machine-learning method that combines classical regression tree analysis with boosting (Elith et al. 2008). BRT was ideal for our study as it can accommodate collinear data (e.g. latitude and longitude) and handle linear and non-linear descriptors with missing values (Elith et al. 2008). BRT partitioning (pBRT) was assessed through an additive partial regression scheme following Feld et al. (2016b). This analysis decomposed each BRT-explained variation into four fractions: (i) pure diversity, (ii) pure abiotic, (iii) shared diversity/abiotic, and (iv) unexplained variation. The shared fraction (iii) represents the variation that may be attributed to biological and/or abiotic descriptors together and is obtained additively in partial regression.
To reduce any spatial autocorrelation in the data arising due to the underlying hydrological network and to evaluate whether the importance of diversity and abiotic predictors in explaining E. canadensisabundance shifted with degree of lake connectivity and eutrophication (hypothesis 3 ), we run independent pBRTs for each lake group using the “dismo” (Hijmans et al. 2017) and “gbm” (Greenwell et al. 2019) packages in R (R Core Team 2019). For each pBRT we used Gaussian distributions, tree complexity of 2, a learning rate of between 0.005 and 0.001, and a bag fraction of 0.5 (Elith et al. 2008). Theset.seed (123) argument in R was used for each BRT as a numerical starting point. Between 145-250 observations per lake group were analysed for each pBRT in order to deliver stable and reliable results (Feld et al. 2016a).
RFAs were then used to assess the extent to which diversity predictors explain E. canadensis abundances through time. Similar to BRTs, RFA is suited to analysing non-linear relationships by fitting a number of models (regression trees) to bootstrapped data subsets with the advantage of handling datasets with a low number of observations and predictors; i.e. our palaeo-data (Elith et al. 2008). RFAs were run using the function rfsrc of the package “randomForestSRC” (Ishwaran & ThenKogalur 2016).