Statistical analyses
For all taxonomic groups, we used Generalized Linear Mixed Models (GLMMs) to test if the different treatment lead to differences in the observed MOTU richness. In GLMMs, the number of MOTUs per sample was calculated and used as a dependent effect, the five treatments were used as predictors, and sample identity was used as a random factor. The model was performed with the generalized poisson distribution error using the R package glmmTMB (Brooks et al., 2017), in order to take into account overdispersion (Consul & Famoye, 1992). If GLMM detected significant differences among treatments, we used treatment contrasts to test if each treatment led to communities significantly different from those unraveled by the “control” condition. Treatment contrasts are standard non-orthogonal contrasts, in which each category (treatment) is compared to a user-defined reference category, and are appropriate to compare multiple treatments against one single control category (in this case, immediate extraction; (Field, Miles, & Field, 2015). The uncorrected number of MOTUs tends to overestimate the actual taxonomic richness (Calderón‐Sanou et al., 2020). Therefore, we repeated this analysis twice: considering all the observed MOTUs, and excluding the rare MOTUs (i.e. MOTUs with frequency < 1% in each sample).
Subsequently, we used multivariate analyses to assess the variation of bacteria, fungi and eukaryotic communities across habitats and treatments. Before running multivariate analyses, we calculated the proportion of reads of each MOTU in each sample. Relative abundance values were then transformed using the Box-Cox transformation, which simultaneously solves the double-zero problem and improves the multivariate normality of data (Legendre & Borcard, 2018).
First, we used Nonmetric MultiDimensional Scaling (NMDS) to describe differences in communities among the three habitats, and check whether different treatments yield different interpretations of ecological relationships among samples. NMDS uses an optimization process to find a configuration of points (samples) in a space with a small number of dimensions, and is suitable for metabarcoding analyses that aim to reconstruct variation in community composition as well as possible, without preserving any particular distance measure among objects (Borcard, Gillet, & Legendre, 2011; Chen & Ficetola, 2020; Paliy & Shankar, 2016). Given its robustness and flexibility, NMDS is often used as the first step to characterize the similarity of communities in metabarcoding studies (Chen & Ficetola, 2020; Paliy & Shankar, 2016). NMDS was run on the Euclidean distance computed on Box–Cox-chord-transformed data (Legendre & Borcard, 2018), by building 1,000 ordinations.
Second, we used ProcMod , a Procrustes-based analysis (Coissac & Gonindard-Melodelima, 2019), to measure the multivariate correlations between the communities obtained using the different treatments.ProcMod can be used to measure the shared variation between matrices, and is particularly appropriate to test relationships between datasets obtained through DNA metabarcoding and metagenomics (Coissac & Gonindard-Melodelima, 2019). Procrustes analyses tend to overfit the data, therefore we used a modified version of Procrustes correlation that is robust to highly-dimensional data and allows a correct estimation of the shared variation between data sets (Coissac & Gonindard-Melodelima, 2019). The Procrustes-based correlation tests were performed using the corls function in the R packageProcMod, using 1,000 randomizations to test the mean covariance between random matrices (Coissac & Gonindard-Melodelima, 2019).
Third, we used redundancy analysis (RDA) to measure the amount of variation among communities that is explained by differences in habitat and treatments (Legendre & Legendre, 2012; Ter Braak, 1986). With habitat typology and treatment as constraining matrices, we used treatment contrasts to test if each treatment led to communities significantly different from those unraveled by the control treatment. Thus, significant treatment contrasts indicate that results between control and experimental treatments differ in an important way, while non-significant results mean that deviation from ideal conditions is not specifically pronounced. Significance of RDA and treatment contrasts was tested through 10,000 permutations using the vegan package in R (Borcard et al., 2011; Oksanen et al., 2019).
For bacteria only, RDA detected significant differences between the control treatment and some of the treatments. We thus ran a similarity percentage analysis with the simper R function (Clarke, 1993) from vegan to identify the taxa contributing to the overall pairwise treatment difference (Geyer et al., 2014). Significance was tested using 50,000 permutations. Given the large number of tests performed, the significance of tests was corrected using the False Discovery Rate (FDR) method with the fdrtool package (Strimmer, 2008). FDR has greater power than traditional approaches (e.g. Bonferroni correction) when performing multiple comparisons (Benjamini & Hochberg, 1995). All statistical analyses were performed in the R environment.