MALDI-TOF MS measurements
From each specimen, a small tissue fragment (max. 1 mm³) was incubated for 5 minutes with 5 µl of alpha-cyano-4-hydroxycinnamic acid (HCCA) matrix. Of this incubated solution, 1 to 1.5 µl were transferred to a target plate on one to nine spots for co-crystallization of matrix and analytes. Each spot was measured one to three times using a Microflex LT/SH System (Bruker Daltonics). Employing the flexControl 3.4. (Bruker Daltonics) software, molecule masses were measured from 2 to 20k Dalton (kDA). A centroid peak detection algorithm was carried out for peak evaluation by analyzing the mass peak range from 2 to 20 kDa. Furthermore, peak evaluation was carried out by a signal-to-noise threshold of two and a minimum intensity threshold of 600 with a peak resolution higher than 400. To validate fuzzy control, the proteins/oligonucleotide method was employed by maximal resolution of ten times above the threshold. To create a sum spectrum, a total of at least 120 laser shots were applied to a spot. Measurements were carried out using the same instrument at different occasions between 2013 and 2019.
MALDI-TOF data processing
MALDI-TOF raw data were imported to R, Version 4.1.0 (R-Core-Team, 2022) and processed using R packages MALDIquantForeign, Version 0.12 (Gibb, 2015) and MALDIquant, Version 1.20 (Gibb and Strimmer, 2012). Spectra were square-root transformed, smoothed using the Savitzky Golay method (Savitzky and Golay, 1964), baseline corrected using the SNIP method (Ryan et al., 1988) and spectra normalized using the TIC method. Repeated measurements were averaged by using mean intensities. Peak picking was carried out using a signal to noise ratio (SNR) of 12 and a half window size of 13. Mass peaks smaller than a SNR of 12 were however retained, if they occurred in other mass spectra as long as these were larger than a SNR value of 1.75, which is assumed as a lower detection limit. Repeated peak binning was carried out to align homologous mass peaks. Resulting data was Hellinger transformed (Legendre and Gallagher, 2001) and used for further analyses.
Hierarchical clustering was carried out in R using average linkage and Euclidean distances and visualized as a circular dendrogram using the R-package dendextend, Version 1.15.1 (Galili, 2015). Random Forest (RF) (Breiman, 2001) was carried out using the R package randomForest, Version 4.6.14 (Liaw and Wiener, 2002). Settings were used according to (Rossel and Martínez Arbizu, 2018) (ntree=2,000, mtry=35, sampsize=number of specimens in the smallest class). Classifications were tested using the RF post-hoc test (Rossel and Martínez Arbizu, 2018), function rf.post.hoc in package RFtools Version 0.0.3 (https://github.com/pmartinezarbizu/RFtools). Classifications were tested for correct class assignment based on empirical assignment probabilities of the RF model. Specimens with correct RF classification and assignment probabilities not deviating significantly (p < 0.05) from the empirical distribution were considered true positive (tp) assignments. Specimens with correct RF classification and significantly different assignment probability were recorded as false positives (fp). RF classification was applied to identification of the different scyphozoan developmental stages to species level excluding all specimens from the respective stage from the RF model. Also, classification of both juvenile stages (polyp and ephyra) was tested with only adult specimens (medusae) retained in the RF model. All RF models for classification always contained all 23 species included in this study.
RF models, using developmental stages as classes, were used to find the most important variables for differentiation of the groups using the Gini index, which shows the degree of dissimilarity of the respective variables (Han et al., 2016). T-distributed Stochastic Neighbor Embedding (t-SNE) plots based on RF model-votes were created using R package Rtsne Version 0.15 (Krijthe, 2015) with the following settings: perplexity=10, max_iter=4,000 and theta=0. Principal Coordinate Analysis was applied to the Hellinger-transformed data using the R package ape, Version 5.5 (Paradis and Schliep, 2019).
Results
The hierarchical clustering carried out on the complete dataset of 278 specimens resulted in distinct, species-specific clusters for all 23 analyzed species (Fig.1) irrespective of sample storage time ( <1 to 111 months) and measuring campaign (supplementary table 1). A RF model based on all analyzed species resulted in an OOB error of 0 thus supporting species-specificity of mass spectra for all analyzed species. Class-specific clusters (i.e. Anthozoa, Hydrozoa, Scyphozoa and Staurozoa) have only been recovered for Scyphozoa (Fig. 1, branch colors). Clustering of congeneric species was found inHaliclystus but not in Cyanea .
Interspecific Euclidean distances ranged from 0.99 to 1.38 (Fig. 2A), while intraspecific differences reached a maximum of 1.35 in the scyphozoan A. aurita when comparing a North Sea polyp and a North Sea medusa. Lowest interspecific distances were recorded between two scyphozoan species, a C. hysoscella medusa and an A. aurita ephyra.
Although the factor species explained the majority of variance found in the data, also different stages played an important role. According to the tested factors (stage, region, number of peaks) the stages in a PERMANOVA (Anderson, 2001) indicate a major percentage of variance explained (17.0% in A. aurita , 27.1% in C. capillata , 22.3% in C. lamarckii, Supplementary Table 3 – 5). Thus, groups according to developmental stages can already be recognized in the Hellinger-transformed processed raw data as presented in PCoAs forA. aurita (Fig. 3a), C. capillata (Fig. 3B) and C. lamarckii (Fig. 3C). However, in some cases stages seem to be highly similar for example for some ephyrae and polyps in C. lamarckii(Fig. 3C). This is also shown in a tSNE plot based on class votes from a RF classification model trained using species-stages as classes (Fig. 3C, tSNE plot). Still the RF OOB error for all species on stage level was 0 and most stages in all species are clearly separated in tSNE plots (Fig. 3, tSNE plots). Accordingly, Euclidean distances within the different stages are in the range of intra-specific distances (Fig. 2B). Nevertheless, distances between stages within a species are distinctly higher, but still lower than average inter-specific differences (Fig. 2B) allowing differentiation of stages also in classification approaches.
Upon a more detailed investigation, distinct important peaks, identified by Gini index within a RF model, can be found differentiated between stages (Fig. 3, heat maps depicting peak intensities). Between some stages, several peaks differ not only by the relative intensities but by their presence and absence. In A. aurita the peak m/z 3447 is absent in all polyp specimens but present in almost all other specimens except for three medusae (Fig. 3A). Other peaks such as m/z 5352 differ between stages mainly by differences in relative intensities, which are in this case higher in polyps compared to other stages. This is also the case for the peak m/z 4581 in C. capillata (Fig. 3B) which is present in all specimens but clearly differs in relative intensities between stages. Other peaks such as m/z 3314 are mainly present in medusae but largely absent in the other stages. Even though in C. lamarckii ephyrae and polyps are very similar in general, several peaks are completely absent in the polyps but widely present in the ephyrae and most intense in the medusae (m/z 2251, 2797 and 4733).
Tests on species identification of selected stages previously excluded from the reference library resulted in 100 % successes (Fig. 4) with only one exception: one C. lamarckii medusa was misclassified as another species (H. digita ). All classifications were tested using a post-hoc test. Whereas the majority of specimens were correctly classified, some classifications were recognized as false positives. In A. aurita, of the analyzed six polyps two were recognized as false positives which may reflect that intraspecific distances within this species were on average the highest between polyps and the other stages. In C. lamarckii the lowest classification and post-hoc test success was recorded in the medusae. Again, these show the highest intraspecific distances to the remaining stages in this species.
The analysis of mass spectra variability in ephyrae of the two scyphozoan species A. aurita and C. capillata from NS and BS origins revealed that intraspecific distances within one species from a certain region are not distinctly different from intraspecific differences in specimens from different regions (Fig. 2A). The Euclidean distances within and between regions were on average lower than the average distances between the different developmental stages (Fig. 2B), demonstrating that the influence of the factor stage was higher than the influence of the factor region. The R2 values of the PERMANOVA also indicate a lower percentage of variance explained by regions in A. aurita (4.2%, Supplementary Table 3) than in C. capillata(10.7%, Supplementary Table 5).
Discussion
Our study validates proteome fingerprinting as a promising tool for identification of species across different classes of Cnidaria. The success of the current study goes alongside other studies displaying the high validity of this method for classification of metazoan taxa across a variety of animal groups such as a collection of Arthropoda (Laakmann et al., 2013; Rossel and Martínez Arbizu, 2019; Nabet et al., 2021; Kürzel et al., 2022; Paulus et al., 2022), Mollusca (Wilke et al., 2020), and Vertebrates (Mazzeo et al., 2008; Mazzeo and Siciliano, 2016; Rossel et al., 2020) from marine, limnic and terrestrial realms. The general applicability of MALDI-TOF MS for differentiation of cnidarian species was shown before on three staurozoans from the North Sea (Holst et al., 2019) and on nine siphonophores of the family Diphyidae (Park et al., 2021). It was furthermore used in an integrative approach for the differentiation of notoriously difficult to identify Hexa- and Octocorallian species showing tendencies to delimit a species complex that could not be resolved using the investigated molecular genetic markers (Korfhage et al., 2022). In the present study, we confirmed that MALDI-TOF MS is also applicable for the fragile gelatinous tissues of hydrozoan and scyphozoan medusae with water contents of > 95 % (Arai, 1997). In addition, the method was tested using species from a wider range of classes in one analysis for the first time. Also, the effect of ontogenetic stage and environmental conditions on proteomic spectra and in turn their impact on reliable species identification in this taxon by MALDI-TOF MS was largely unknown so far.
Average intra- and interspecific distances were clearly different as was also found in studies on different groups of crustaceans (Renz et al., 2021; Paulus et al., 2022). Our results demonstrate that intraspecific variability in scyphozoans was strongly affected by the ontogenetic stage. It is possible to differentiate within species on the level of ontogenetic stage which was previously shown on calanoid copepods (Rossel et al., 2022a). Although the benthic polyp and the pelagic ephyra / medusa stages in metagenetic cnidarians represent two generations with very different morphologies, the different stages formed clear species clusters. However, stage-specific investigations in our study were limited to the class Scyphozoa, and consequently, future studies should also include polyp stages of metagenetic hydrozoan species.
Stage identification by MALDI-TOF MS in cnidarians is mainly of interest if tissue samples of unknown stage (origin) are analyzed since the identification of a certain stage in the life cycle of metagenetic cnidarians is less challenging than the identification of ontogenetic stages in other marine invertebrates as for example in copepods. Still, knowledge on ontogenetic variation of spectra will be of high relevance for defining the minimum requirements on stage-resolution in the applied reference library, i.e. whether the inclusion of adult stages will be sufficient to identify juvenile stages. Our results indicate that mass peak variability between stages does usually not affect species level classification. Even though there are differences between mass spectra from the different stages, differences in the proteome among stages was smaller than interspecific differences. Most specimens from certain stages were still confidently classifiable on species level even if the respective stage had been removed from the reference library. This may allow identification of real samples using a partly incomplete reference library concerning the different stages. For example, scyphozoan ephyra and polyp stages which are difficult to identify to species level by morphological methods (Holst, 2012) could be identified by MALDI-TOF MS even if the reference library is based on adult medusae only. Problems may occur in cases where between-stage distances are high as seen inC. lamarckii and C. hysoscella . Although in these cases, RF classification success was high, the majority of classifications were rejected by the post-hoc test based on assignment probabilities. These classifications would therefore require a re-investigation by morphology or genetic approaches (Rossel and Martínez Arbizu, 2018). With a growing reference library, the amount of correct classifications being recognized as false positives will most likely decrease.
Differences between stages can be seen in a variety of mass peaks. Some are frequently found to be present or absent in some stages. However, the majority of peaks, also those driving the stage differences in the RF model, differ only in relative mass-peak intensity. Thus, mass spectra rather seem to change continuously with development than abruptly with the onset of the next stage. Certain proteins or peptides may already be expressed before transition to the next developmental stage or still be expressed after transition and therefore be recorded in similar stages. This would be comparable to some kind of intermoult status assumed to cause misclassification in different stages of Calanus species classified using MALDI-TOF MS (Rossel et al., 2022a).
Other factors influencing mass spectra variability may be environmental differences impacting the physiology (Karger et al., 2019) and/or underlying differences between populations (Müller et al., 2013; Benkacimi et al., 2020). Mass spectra variability of the two scyphozoans from the North Sea and the Baltic Sea were not as strongly influenced by their sampling localities as was previously shown for copepods (Peters et al., 2022). Specimens tend to cluster according to regions (supplementary figure 1) but clusters also align with sampling and/or measurement occasions. There was no effect of specimen origin on species classification by the RF model. In other taxa, groupings based on MALDI-TOF MS data according to sample location were found, for example in calanoid copepods clustering according to origin lake (Riccardi et al., 2012). Thus, it can be assumed that ecological factors influence proteomic fingerprints, however the strength of the effect may depend on taxon physiology. Copepods osmoregulate by changes of the osmolarity of their hemolymph (Roddie et al., 1984; Lee et al., 2012) and changes in protein expression were found under osmotic stress (DeBiasse et al., 2018). Differences in salinity may have lower effects on the measured proteome in osmoconformers (Rivera-Ingraham and Lignot, 2017). Although, changes in salinity can impact cnidarian larval settlement and reproduction (Glon et al., 2019; Dańko et al., 2020; Schäfer et al., 2021), protein expression associated with osmoregulation may be less affected since cnidarians are osmoconformers with only slight differences in the osmolarities of sea water and the gelatinous tissues (Wright and Purcell, 1997; Graham, 2001).
The fact that preserved specimens of the same species which were stored for different time periods or were measured on different dates clustered together, demonstrates the reliability of the approach. Previous studies have shown that siphonophore tissues preserved more than one year were still useful for MALDI-TOF MS (Park et al., 2021). Our results now confirm that even cnidarian samples preserved much longer with storage times up to 111 months can still be successfully used.
Moreover, the fact that scyphozoan specimens reared in laboratory cultures clustered together with their conspecifics collected in the field corroborates the assumption that the effects of environmental factors on proteomic fingerprints are low in this taxon. This would facilitate the applicability of the method since the creation of a reference library for different environments would not be necessary. However, verifying this demands investigation of further species from other regions. To include considerations about variability based on sample origin for future reference libraries, thoroughly planned experiments should be carried out to investigate which ecological factors have a major impact on mass-spectra variability.
From our results we conclude that proteomic fingerprinting is a reliable method to differentiate and identify cnidarian species including different scyphozoan life-history stages. Especially in the context of identifying specimens deformed beyond recognition from samples fixed for monitoring purposes, this time- and cost-effective method represents a valid alternative method to molecular genetic identification tools.