Role of diversity and abiotic factors
In an attempt to control extrinsic anthropogenic variables that affect
native and non-native plant richness, as a second complementary
analytical step, we used a combination of Boosted Regression Tree
analysis (BRT; Elith et al. 2008) and Random Forest Analysis (RFA;
Breiman 2001). These two analyses address more specifically the extent
to which diversity and abiotic factors alone determine E.
canadensis abundance variation (Feld et al. 2016a). BRT was used to
partition the variation in E. canadensis abundance explained by
diversity (hypothesis 1 ) and abiotic descriptors
(hypothesis 2 ) alone, and how they might together reflect habitat
heterogeneity (hypothesis 3 ) at the landscape scale (Feld et al.
2016b). BRT constitutes a machine-learning method that combines
classical regression tree analysis with boosting (Elith et al. 2008).
BRT was ideal for our study as it can accommodate collinear data (e.g.
latitude and longitude) and handle linear and non-linear descriptors
with missing values (Elith et al. 2008). BRT partitioning (pBRT)
was assessed through an additive partial regression scheme following
Feld et al. (2016b). This analysis decomposed each BRT-explained
variation into four fractions: (i) pure diversity, (ii) pure abiotic,
(iii) shared diversity/abiotic, and (iv) unexplained variation. The
shared fraction (iii) represents the variation that may be attributed to
biological and/or abiotic descriptors together and is obtained
additively in partial regression.
To reduce any spatial autocorrelation in the data arising due to the
underlying hydrological network and to evaluate whether the importance
of diversity and abiotic predictors in explaining E. canadensisabundance shifted with degree of lake connectivity and eutrophication
(hypothesis 3 ), we run independent pBRTs for each lake group
using the “dismo” (Hijmans et al. 2017) and “gbm” (Greenwell et al.
2019) packages in R (R Core Team 2019). For each pBRT we used Gaussian
distributions, tree complexity of 2, a learning rate of between 0.005
and 0.001, and a bag fraction of 0.5 (Elith et al. 2008). Theset.seed (123) argument in R was used for each BRT as a numerical
starting point. Between 145-250 observations per lake group were
analysed for each pBRT in order to deliver stable and reliable results
(Feld et al. 2016a).
RFAs were then used to assess the extent to which diversity predictors
explain E. canadensis abundances through time. Similar to BRTs,
RFA is suited to analysing non-linear relationships by fitting a number
of models (regression trees) to bootstrapped data subsets with the
advantage of handling datasets with a low number of observations and
predictors; i.e. our palaeo-data (Elith et al. 2008). RFAs were run
using the function rfsrc of the package “randomForestSRC”
(Ishwaran & ThenKogalur 2016).