2.3. Statistical and models analyses
The descriptive statistics and correlation analyses were performed using SPSS v.21.0 (IBM 2012). Because the species richness values abnormally distributed, and often over-dispersed where variance-mean ratio greater than 1 (Crawley 2012), we used generalized linear models with negative binomial residuals (GLM-NB) to analyse the relationship between species richness and environmental-human variables. GLM-NB could compensate for overdispersed data compared to Poissonian residuals, and this method widely used for ecological count data (Crawley 2012). For each predictor, we performed GLM to evaluate explanatory power (R2, %) of each predictor. This explanatory power was calculated as (null deviance - residual deviance/ null deviance) × 100 (Panda et al. 2017). The glm.nb() function used for GLMs in the ‘MASS’ R package (Venables & Ripley 2013). We used stepwise regression to explore the combined effect of environmental and human factors on conifer species richness and evaluate the best group of predictors with the highest explanatory power. One variable from each predictor category was selected to build a model with seven predictors to reduce the multicollinearity (Faraway 2016) because most of the predictors in each explanatory category were highly correlated to each other (Table S4, Appendix A). We made all the possible combinations of predictors using four energy, three water, two seasonality, two topographic heterogeneity, one human influence, four soil types, and seven soil nutrient-fertility variables, which yielded 4 × 3 × 2 × 2 × 1 × 4 × 7 = 1,344 models for each richness category. We selected the best model that had the lowest Akaike information criterion (AIC) for each richness group. Within each best model, we calculated the variance inflation factor (VIF) for all predictors using ‘car’ package in R 3.6.1 to evaluate the significance of multicollinearity (Legendre & Legendre 2012; Fox et al. 2016), where the collinearity is considered to be significant when VIF > 5. Similarly, to compare the proportion of variance explained by each predictor category in addition to differentiate their independent and joint effects on conifer richness, we applied hierarchical partitioning analysis using ‘hier.part’ R package (Walsh & Mac Nally 2013). We first conducted a principal component analysis (PCA) with the varimax rotation in SPSS v.21.0 (IBM 2012) within each of the predictor categories and then extracted the first axis to represent each category (hypothesis). We standardized the human influence variable instead because PCA cannot be applied to a group containing only one variable.