Ecological status separates whole biome and individual domain compositions
It was not possible to separate the five ecological status groups based on the composition of the analysed streams on its own (data not shown). This was expected as the highly diverse and variable composition of the biome between samples obscures the relationship between ecological status and biological diversity. This observation is supported by previous studies exploring the potential of metabarcoding as an alternative for conventional bioassessments in freshwater streams (Kuntke et al., 2020). Another study exploring prediction of anthropogenic activity in rivers also found that the complete observed diversity could not explain ecosystem quality, and suggested the use of indicator organisms to specialise a potential model (Li et al., 2018).
To explore the relationship between ecological status and beta diversity further, a canonical correspondence model (CCA) was generated for the whole biome, as well as the individual domain data (Figure 2). Beta diversity analysis using CCA is a well-described and widely applied method in ecological studies (Braak & Verdonschot, 1995), which makes the chosen approach in the present study directly compatible with existing protocols for data analysis. The CCA model, constrained by ecological status, revealed that the whole biome data (Figure 2a) achieved the best separation, followed by the Bacteria (Figure 2b), where near complete separation of all five ecological status groups was achieved. Prokaryotic communities associated with sediments and surface waters have previously been shown to be sensitive to environmental changes, and have been suggested as a tool for biomonitoring of pollution (Li et al., 2018; Mlejnková & Sovová, 2010). The bacterial communities of freshwater streams may present a relatively unexplored approach with a high potential for the discovery of new indicators for bioassessment.
A gradient like overlap between streams of bad and poor, and moderate to high ecological status was observed for the eukaryotic data, which is in line with previous studies focusing on metabarcoding of invertebrates (Elbrecht, Vamos, Meissner, Aroviita, & Leese, 2017; Kuntke et al., 2020), as well as well-described ecological quality measurement protocols, which are based in the identification and abundance of chosen indicator species (Birk et al., 2012). The observed archaeal community was not able to separate the samples based on ecological quality in a meaningful way, however, this is likely related to the low presence and lack of differentiation to the surrounding environment and/or the coverage of the chosen primer set which might only capture a part of the archaeal taxa. It has previously been shown that archaeal communities in sediments are highly diverse, as well as sensitive to environmental change (Hoshino & Inagaki, 2019), and may be worth investigating in more details in relation to biomonitoring protocols of freshwater systems.
The domain-specific diversity analysis could potentially be extended with network analysis to reveal potential ecologically meaningful relationships within and across domains, which could strengthen the detection of indicator species and organisms associated to individual ecological status classes. A similar approach has previously been applied in paddy soils (Wang et al., 2017). Alternatively, indicator organisms could be extracted from the dataset to simplify the dimensionality of metabarcoding data and provide basis for a model describing the relationship between the biome and ecological status. This approach has previously been applied in rivers in China (Li et al., 2018). The data gathered in the present study is promising for this type of approach, however, due to the low number of samples with bad to moderate ecological status, the statistical strength of a predictive model generated based on the current dataset would be relatively low. Additional sampling to increase sampling size, especially in the lower quality ecosystems, would strengthen the data and enable the development of a predictive model based on biome composition data.