Statistical analyses
Except when specified, we performed the following statistical analyses at sampling site level, after integrating the community tables of the two soil layer samples (leaf litter and deep soil) for all three taxonomic groups (Acari, Collembola and Coleoptera) into a single matrix. Each of the analyses was performed at both ASV and OTU levels.
α diversity and community uniqueness:
We used ANOVAs to test for significant differences in α diversity (RICH) and community uniqueness (LCBD, local contribution to β diversity) between forest habitats and soil layers. We used generalized linear mixed models (GLMMs) to analyse the relationship between RICH or LCBD per site and the topoclimatic variables (ENVPC1 and ENVPC2) as predictors, with latitude and longitude as covariates. We built GLMMs fitting forest habitat type as a random effect in order to account for non-independence among samples from the same forest habitat (see Supplemental Information).
β diversity:
The clustering of sampling sites according to their community composition was visualised using non-metric multidimensional scaling (NMDS) assuming two dimensions (k =2). Differences among forest habitat types were tested for significance using permutational multivariate analyses of variance (PERMANOVA). We used symmetric Procrustes analyses to statistically assess non-randomness among NMDS ordinations calculated from different community dissimilarity matrices (βSOR vs. βSIM and ASVs vs. OTUs; see Peres-Neto & Jackson, 2001). These analyses were performed using the vegan (Oksanen et al., 2020) and pairwiseAdonis (Martinez Arbizu, 2020) R packages.
The effect of forest habitat type and of the spatial and topoclimatic predictors on β diversity patterns was tested using distance-based redundancy analyses (dbRDA; Legendre & Anderson, 1999), at both across-habitats and within-habitat scales. Specifically, the response variables were the community dissimilarity matrices based on the Simpson dissimilarity index (βSIM), and the explanatory variables were forward selected from full models containing the following sets of predictors: (i) forest habitat type (HAB); (ii) spatial variables (SPAPCNMi) derived from the transformation of the topographic weighted distance matrix using Principal Coordinates of Neighbour Matrices (PCNM, see Supplemental Information), and (iii) topoclimatic variables (ENVPC1and ENVPC2). The best-fit model was selected after ensuring there were no issues of multicollinearity (Variance Inflation Factors, VIF <10; e.g., Tonkin, Stoll, Jähnig, & Haase, 2016). Finally, the best-fit models for the across-habitat analyses were used to partition the variance explained exclusively by each variable group (forest habitat type, space and topoclimate) and their intersections using the adjusted coefficient of determination (R 2ADJ) (e.g., Zinger et al., 2019). These analyses were performed using the BiodiversityR (Kindt & Coe, 2005) and vegan R packages.
In order to validate the dbRDA inferences (Jupke & Schäfer, 2020), we also applied multivariate generalized linear models (mvGLMs) as implemented in the mvabund R package (Wang, Naumann, Wright, & Warton, 2012). Unlike dbRDA which is a distance-based approach, this model-based method fits a separate GLM per species and performs resampling-based hypothesis testing for community-level effects of predictors. Here, incidence (presence/absence) datasets were used as input and tested against the same sets of predictors as in dbRDA. Models were fitted using a binomial error distribution and a log link function. We first assessed the predictor significance using single-term models and only those predictors showing a significant effect were used to assemble a full model. The best-fit model was built following a backward stepwise selection approach (e.g., How, Cowan, Teale, & Schmitt, 2019). Term significance was assessed using likelihood ratio tests, PIT-trap resampling and 999 bootstrapping iterations (Wang et al., 2012).
Finally, we assessed the effect of habitat connectivity on the community composition of the Qa sampling sites by applying matrix regressions with randomization (MRR; Wang, 2013). Specifically, the community dissimilarity matrices based on the Simpson dissimilarity index (βSIM) were used as response variables, while the explanatory variables were selected among: (i) resistance due to habitat fragmentation (FRAIBR), (ii) resistance due to topographic complexity (TRIIBR), (iii) resistance due to a “flat landscape” (NULLIBR), (iv) weighted topographic (SPATWD) distances and (v) topoclimatic (ENVPC1-2) distances. We assembled a full model including all explanatory matrices and built the best-fit model following a backward stepwise selection approach, using 999 permutations for the significance tests (e.g., Ortego, Gugger, & Sork, 2015). The unique contribution of each predictor to the total variance explained by the best-fit model was quantified using hierarchical variance partitioning analysis in the hier.part R package (Walsh & Mac Nally, 2003). Once the best-fit model was selected, we visualised the relationship between community similarity (1-βSIM) and the explanatory distance/resistance matrices and fitted a distance-decay of community similarity curve (Gómez-Rodríguez & Baselga, 2018; Nekola & White, 1999). Specifically, we fitted a negative exponential function to univariate GLMs assuming a Gaussian error distribution and a log link function as implemented in the betapart R package.