Discussion
Diatoms are important organisms to understand aquatic ecosystem functioning since they play an important position as producers (Rimet et al., 2018). Unfortunately, our knowledge of diatom biodiversity is still limited given the great number of estimated extant species (Mann and Vanormelingen, 2013). However, DNA metabarcoding provides a powerful tool to examine unknown diatom diversity and expand our knowledge about their distribution patterns.
Contrary to our expectations (Hyphothesis 1 ), we observed a relatively poor correspondence between both morphology-based and molecular-based approaches. We found, however, that both methods provided similar information when it comes to the underlying processes determining geographical variation in diatom communities, thereby supporting our second hypothesis (Hypothesis 2 ). There are several potential explanations for the relatively low congruence found between the two approaches:
  1. Choice of DNA marker. The rbc L gene is a common marker used for metabarcoding and phylogenetic studies (Keck, Vasselon, Rimet, Bouchez, & Kahlert, 2018). This gene coding for a protein, so alignment is simple, insertions or deletions are extremely rare and compared with ribosomal or mitochondrial markers, the likelihood of amplifying non-specific products is reduced (Soltis and Soltis, 1998; Evans, Wortle, & Mann, 2007). Moreover, the rbc L gene seems to separate taxa better than 18S rDNA gene at species level (Kermarrec et al., 2014). However, rbc L marker gene does not work for species lacking a functional plastid (obligatory heterotrophs) such asNitzschia alba (Kowalska et al., 2019). In addition, the short length 312-bp rbc L barcode gene is readily PCR amplifiable, which makes the analysis easier. However, the using of a short sequence (<500 bp) for barcoding may constrain the taxonomic and phylogenetic assignment, as the information content in the sequence is limited (Medlin, 2018; Tedersoo, Tooming-Klunderud & Anslan, 2018). In this regard, Keck et al. (2018) compared the placement accuracy of the 312-bp gene fragment with the full-length of the rbc L gene in their phylogeny, and observed that approximately the 45% of the species were placed exactly at full-length gene. Similarly, our phylogeny constructed with 708 reference sequences and 3138 taxonomy-assigned OTUs shown that several reference sequences were placed far of their corresponding taxonomy-assigned OTUs. The correct placement of sequences on a phylogeny depends on several factors as the choice of marker gene, the length of amplicons and the presence of closely related taxa in the reference phylogeny (Keck et al. (2018). Tedersoo et al. (2018) have highlighted the importance of length sequence in metabarcoding, emphasizing that longer amplicons increase the accuracy of identification at the species level. We speculate that using a combination of two or more DNA barcode regions or others marker genes (e. g. Second Internal Transcriber Spacer) could be more suitable for unambiguous species identification, especially to distinguish closely related species (Moniz and Kaczmarska, 2009).
  2. The PCR reaction used to amplify the barcode region can be inhibited by contaminants and produce chimeric DNA molecules (Hugerth and Andersson, 2017). Moreover, several organisms may be underestimated if their DNA template does not hybridize with the designed primers.
  3. On the other hand, the completeness of the reference database is a key factor that strongly limits the taxonomy assignment of OTUs. In this vein, a large number of diatom taxa morphologically identified (31 species and 8 genera) could not be detected by metabarcoding approach due to the lack of reference sequences in the R-Syst::diatom database. Thus, some species detected only by light microscopy (e. g.Cocconeis euglypta, C. pediculus, Stauroneis producta and S. gracilis) differed from those detected by metabarcoding approach (C. cupulifera, C. mascarenica, S. anceps and S. gracilior ). Following Jahn, Zetzsche, Reinhardt, & Gemeinholzer (2007), we further hypothesize that taxa with sequences absent in the reference database could be compensated by taxa of the same genus that have sequences available in the reference database or by a taxon not expected in the studied ponds. This hypothesis could explain the relatively minor discrepancies observed between both inventories at genus level resolution.
  4. The bioinformatics processing might have also played an important role in the discrepancies observed between morphological and molecular inventories. Typically, DNA sequences obtained in a high-throughput sequencing run are filtered and clustered, based on a distance matrix at a specified threshold, into Operational Taxonomic Units (OTUs) to reduce the PCR and sequencing errors and the polymorphism present in the barcode region (Chen, Zhang, Cheng, Zhang, & Zhao, 2013). The clustering process is mainly affected by the clustering method and the threshold value used for sequence similarity (Chen et al., 2013). Often, sequences are clustered at 97% similarity, however different taxa could have less distance between their barcodes (Hugerth and Andersson, 2017). By contrast, using of high sequence similarity threshold value increase the number of unclassified OTUs and the PCR and sequencing errors (Tapolczai et al., 2018). Nevertheless, a common identity threshold for assigns taxonomy to all diatom taxa does not appear exist yet due to the heterogenous evolution rate of therbc L gene and the speciation process (Kermarrec et al., 2014). In addition, relation between OTUs and biological species is not straightforward (Ryberg, 2015; Bálint et al., 2016).
Interestingly, we observed the presence of some marine species in our molecular inventory, e.g. Thalassiosira profunda andThalassiosira mediterranea (Percopo, Siano, Cerino, Sarno, & Zinigone, 2011; Hasle 1990). The sequences assigned to such species were placed far of their respective reference sequences in our phylogeny, which could reflect an inaccurate taxonomic assignment. However, the taxonomic assignment at genus level of such sequences could be correct since thalassiosiroids feature prominently in freshwater ecosystems, rivaling their freshwater diversity with the marine ones (Alverson, 2014). On the other side, microscopy method has a lower capacity to detect rare species than metabarcoding (Rimet et al., 2018), whereas that molecular-based approach allows detecting all species that could be detected by this method, covering the full range of species richness. However, we hypothesize that using a higher similarity threshold value for taxonomic assignment or using simultaneously other marker genes could be more suitable to assign unequivocally taxonomy to such DNA sequences.
  1. On the other hand, several species (cryptic) may be morphologically identical but have genetic differences (Zimmermann et al., 2015). Several molecular studies (Mann and Vanormelingen, 2013; An, Choi, Lee, Lee, & Noh, 2018) have suggested that diatom biodiversity has been underestimated. For example, in our study we identified morphologically only 12 infrageneric taxa belonging toNitzschia genus, whereas by metabarcoding approach were detected 24 taxa. This fact could be related with the cryptic diversity observed within the morphologically identifiedNitzschia palea species complex (Trobajo et al., 2010). Likewise, genetically distinct entities have been observed within morphologically identified species in Cyclotell a,Eunotia , Gomphonema , Hantzschia , Navicula ,Pinnularia and Sellaphora (Rovira et al., 2015). On the other side, the intraspecific and intragenomic polymorphism present in the barcode region can overestimate the species richness, since members of a single taxon possess several genotypes at the barcode region and may clustered into different OTUs (Mora et al., 2019). In addition, individuals of the same species from different geographic populations may possess different barcode sequences (Medlin, 2018).
  2. Other factors, as the presence of extracellular DNA, can affect the composition of molecular inventories. Thus, extracellular DNA from diatom species may be detected in a sample even if their cells are not physically present, adding extra taxa to the molecular inventory (Kermarrec et al., 2014; Rimet et al., 2018). Moreover, our morphological identifications were based on the observation of live material only. Thereby, some taxa founded in our molecular inventories (e.g. Attheya septentrionalis) may hardly be identified by microscopic methods since they are weakly silicified (Stachura-Suchoples, Enke, Schlie, Schaub, Karsten & Jahn, 2015). Finally, the high number of synonyms present on diatoms taxonomy may hinder the comparison of morphological and molecular inventories (Hillebrand, Watermann, Karez & Berninger, 2001).
In spite of all biases inherent to both morphological and metabarcoding methods, compositional variation of diatom communities was positively correlated with the environmental template, thereby emphasizing that diatom communities were mainly controlled by niche-based mechanisms (e.g. species sorting) and confirming our second hypothesis (Hypothesis 2 ). Similar results have been reported by other studies on diatom communities (Verleyen et al., 2009; Göthe et al., 2013; Jamoneau, Passy, Soininem, Leboucher, & Tison-Rosebery, 2017), in which the environmental factors dominated the spatial and biological processes on structuring benthic algal communities. Moreover, in our study similar environmental variables (e.g. total suspended solids) were correlated in both inventories with diatom composition variation, which could be related with the same sampled substrate (S. lacustris ), since diatom species may exhibit a tight environmental tolerance and strong preferences for particular substrata (Soininen, 2007; Cantonati & Spitale, 2009). Host macrophytes are important elements supplying nutrients to epiphytic diatoms, especially in oligotrophic and mesotrophic waters (Letáková, Fránková, & Poulíčková, 2018). In our study, morphological and molecular inventories were related with nutrients (e.g. total phosphorus and ammonium), which is expectable since nutrients (particularly phosphorus) are important for diatoms primary productivity and growth (Pan, Stevenson, Hill, Herlihy, & Collins, 1996). Ammonia influence importantly the diatom composition and may be a limiting nutrient in primary productivity (Natarajan, 1970). Similarly, fluoride can improve or inhibit the growth of diatoms depending of its concentration, exposure time and diatom species (Camargo, 2003). Moreover, conductivity was related with morphological inventory at species level resolution, which is foreseeable since diatoms are very sensitive to ionic content and composition, and consequently, they are often used to monitor conductivity fluctuations (Potapova and Charles, 2003). Finally, both morphological and molecular inventories were related with total suspended solids variable, which may influence diatom assemblages by processes as light decreasing, nutrient adsorption and algae aggregation (Hoshikawa et al., 2019).
Microorganisms, and particularly diatoms, have historically been considered to be ubiquitously distributed due their small size and huge population densities, and their communities mainly controlled by local environmental factors (Soininen, 2007; Hillebrand et al., 2001). Nevertheless, this distribution pattern has been challenged by several studies (Heino et al., 2010; Soininen, 2007; Blanco, Olenici, & Ortega, 2020), suggesting that variation in community structure cannot be explained by environmental factors alone, and thereby questioning the strict ubiquitous dispersal of diatom communities. We found no significant correlation between compositional variation of diatom assemblages and spatial distance, which may be explained by the relatively small extent of our study area. The effect of spatial distance may be more important at large spatial extents, while environmental factors may be more important at reduced extents (Alahuhta & Heino, 2013; Declerk et al., 2011). However, in stochastic and highly heterogeneous systems as temporary ponds, environmental control may not necessarily be strong (Heino et al., 2015). Hence, other factors not assessed in our study, such as biotic interactions, may also be important to structure diatom communities (Göthe et al., 2013). Nevertheless, we are confident that we included an environmental template frequently known to influence the composition of diatom communities (Pan et al., 1996; Potapova and Charles, 2003). Moreover, the environmental template we included varied extensively across ponds, thereby leading potential for species sorting.
In summary, our study showed that both molecular and morphological methods were influenced by several biases inherent in its own methodology. The main biases related to molecular approach were probably the incompleteness of the reference database and the bioinformatics processing, which highlight the need of expand the reference database to include all genotypes of occurring taxa and the need of reach a consensus about the bioinformatics processing in order to favor the comparison between studies. In addition, establishing robust species identification thresholds and using a combination of two or more DNA barcode regions could be suitable for unambiguous species identification, especially in those cases where a single marker gene shows low variability. On the other side, the limited counting effort of morphological approach and the presence of cryptic species were presumably the main biases related with the morphological approach. Our results showed that both approaches were related with the environmental template, suggesting that Mediterranean epiphytic diatom communities are mainly controlled by niche-based mechanisms at regional extents. However, we have not found a significant correlation between compositional variation of diatom assemblages and spatial distance, probably explained by the regional spatial extent studied. In conclusion, our work shows that both molecular and morphological approaches provide complementary information on each other and highlighted the importance of metabarcoding approach to infer the composition of epiphytic diatom assemblages, especially when completeness of the reference databases improves and bioinformatics biases are overcome.