2.3. Statistical and models analyses
The descriptive statistics and correlation analyses
were
performed using SPSS v.21.0 (IBM 2012). Because the species richness
values abnormally distributed, and often over-dispersed where
variance-mean ratio greater than 1 (Crawley 2012), we used generalized
linear models with negative binomial residuals (GLM-NB) to analyse the
relationship between species richness and environmental-human variables.
GLM-NB could compensate for overdispersed data compared to Poissonian
residuals, and this method widely used for ecological count data
(Crawley 2012). For each predictor, we performed GLM to evaluate
explanatory power (R2, %) of each predictor. This
explanatory power was calculated as
(null deviance - residual deviance/
null deviance) × 100 (Panda et al. 2017). The glm.nb() function
used for GLMs in the ‘MASS’ R package (Venables & Ripley 2013). We used
stepwise regression to explore the combined effect of environmental and
human factors on conifer species richness and evaluate the best group of
predictors with the highest explanatory power. One variable from each
predictor category was selected to build a model with seven predictors
to reduce the multicollinearity (Faraway 2016) because most of the
predictors in each explanatory category were highly correlated to each
other (Table S4, Appendix A). We made all the possible combinations of
predictors using four energy, three water, two seasonality, two
topographic heterogeneity, one human influence, four soil types, and
seven soil nutrient-fertility variables, which yielded 4 × 3 × 2 × 2 × 1
× 4 × 7 = 1,344 models for each richness category. We selected the best
model that had the lowest Akaike information criterion (AIC) for each
richness group. Within each best model, we calculated the variance
inflation factor (VIF) for all predictors using ‘car’ package in R 3.6.1
to evaluate the significance of multicollinearity (Legendre & Legendre
2012; Fox et al. 2016), where the collinearity is considered to
be significant when VIF > 5. Similarly, to compare the
proportion of variance explained by each predictor category in addition
to differentiate their independent and joint effects on conifer
richness, we applied hierarchical partitioning analysis using
‘hier.part’ R package (Walsh & Mac Nally 2013). We first conducted a
principal component analysis (PCA) with the varimax rotation in SPSS
v.21.0 (IBM 2012) within each of the predictor categories and then
extracted the first axis to represent each category (hypothesis). We
standardized the human influence variable instead because PCA cannot be
applied to a group containing only one variable.