2.6 Statistical analysis
All statistical analyses were performed in R v4.0.5 Venn diagrams were
constructed usinghttp://bioinformatics.psb.ugent.be/webtools/Venn/.
The effect of sampling day, sampling position, and temperature and
discharge on overall species/OTU richness was tested using generalized
additive models (GAM) with a poisson distribution and a “log” link
function implemented by the gam() function in the R package mgcv (Wood,
2015). Where overdispersion was detected, we corrected the standard
errors using a quasi-GAM model. Discharge and temperature were
respectively quantified as the average discharge and mean hourly
temperature of each sampling day (Table S1). To test for the effect of
our predictor variable sampling position on our response variable OTU
richness/read abundance a Kruskal-Wallis rank sum test was performed, as
our data was non-normal and non-linear and we had three group variables.
Further we divided our dataset into the two groups merolimnic and
hololimnic taxa and subsetted these groups into the most abundant taxa,
Diptera and EPT for merolimnic and Annelida and Coleoptera for
hololimnic. Then we used GAMs to determine the effect of sampling day,
temperature and discharge on species/OTU richness separately for each of
the eight groups. To assess the effect of sampling day on the relative
proportion for each FFG we used a GAM with a binomial distribution and a
”logit” link function. Only the term for sampling day was smoothed in
all GAMS via cubic regression splines with a basis dimension k =
9. Model diagnostics were checked via plots of residual versus fitted
values. The marginal effects, the slopes of the prediction equation, of
the significant explanatory variables were plotted at the mean of the
covariates to incorporate the possible minimal effects that the
insignificant explanatory variables could potentially have on the slope
of the significant explanatory variable.
An ANOSIM test was used to test for the effect of sampling position on
OTU and species temporal beta-diversity (Jaccard distance) defined as
the shift in the identities of named taxa in a specified assemblage over
two or more time points (Magurran et al., 2019). Additionally, we
partitioned temporal beta-diversity into its components turnover -
reflecting replacement of species between samples - and nestedness -
occuring when species loss or gain causes species-poor sites to resemble
a strict subset of species-rich sites (Baselga, 2010) using the betapart
package (Baselga et al., 2018) and visualized their change over the time
series as pairwise comparisons in relation to the first timepoint. For
comparisons between seasons, we grouped our dataset into spring (March,
April, May), summer (June, July, August), autumn (September, October,
November) and winter (December, January, February) samples. To visualize
differences between sampling positions and seasons for species and OTU
community composition, non-metric multidimensional scaling (NMDS) based
on Jaccard’s distance was conducted using the package vegan (Oksanen et
al., 2022).
To identify species that are most strongly associated with a season or
only found at a specific season, an indicator species analysis based on
presence - absence was performed using the R package indicspecies with
the function multipatt() (De Cáceres, Legendre, Moretti, 2010). The
association of species to seasons was calculated using the indicator
value index from Dufrêne and Legendre 1997. We used the indicator value
index instead of a correlation based index to get results which best
match the observed presence of a species rather than preferences of one
season over another. Using preferences could lead to the problem of
species being reported as indicator for a specific season, although the
species was also almost frequently present in other seasons. For
biomonitoring aspects it is therefore favourable to use the observed
presence of species. The indicator value index ranges between 0 and 1
(with 0 indicating no association) and is calculated as the product of
two quantities A, the probability that the surveyed site belongs to the
target group (i.e., season) when the species has been found, and B, how
frequently the species is found in the different samples from each
season. The function determines which season corresponds to the highest
association value for each species and then tests the statistical
significance of this relationship using a permutation test. We corrected
for size differences between groups, as we had different sample sizes
between seasons, using the implemented value within the multipatt()
function.