MALDI-TOF MS measurements
From each specimen, a small tissue fragment (max. 1 mm³) was incubated
for 5 minutes with 5 µl of alpha-cyano-4-hydroxycinnamic acid (HCCA)
matrix. Of this incubated solution, 1 to 1.5 µl were transferred to a
target plate on one to nine spots for co-crystallization of matrix and
analytes. Each spot was measured one to three times using a Microflex
LT/SH System (Bruker Daltonics). Employing the flexControl 3.4. (Bruker
Daltonics) software, molecule masses were measured from 2 to 20k Dalton
(kDA). A centroid peak detection algorithm was carried out for peak
evaluation by analyzing the mass peak range from 2 to 20 kDa.
Furthermore, peak evaluation was carried out by a signal-to-noise
threshold of two and a minimum intensity threshold of 600 with a peak
resolution higher than 400. To validate fuzzy control, the
proteins/oligonucleotide method was employed by maximal resolution of
ten times above the threshold. To create a sum spectrum, a total of at
least 120 laser shots were applied to a spot. Measurements were carried
out using the same instrument at different occasions between 2013 and
2019.
MALDI-TOF data processing
MALDI-TOF raw data were imported to R, Version 4.1.0 (R-Core-Team, 2022)
and processed using R packages MALDIquantForeign, Version 0.12 (Gibb,
2015) and MALDIquant, Version 1.20 (Gibb and Strimmer, 2012). Spectra
were square-root transformed, smoothed using the Savitzky Golay method
(Savitzky and Golay, 1964), baseline corrected using the SNIP method
(Ryan et al., 1988) and spectra normalized using the TIC method.
Repeated measurements were averaged by using mean intensities. Peak
picking was carried out using a signal to noise ratio (SNR) of 12 and a
half window size of 13. Mass peaks smaller than a SNR of 12 were however
retained, if they occurred in other mass spectra as long as these were
larger than a SNR value of 1.75, which is assumed as a lower detection
limit. Repeated peak binning was carried out to align homologous mass
peaks. Resulting data was Hellinger transformed (Legendre and Gallagher,
2001) and used for further analyses.
Hierarchical clustering was carried out in R using average linkage and
Euclidean distances and visualized as a circular dendrogram using the
R-package dendextend, Version 1.15.1 (Galili, 2015). Random Forest (RF)
(Breiman, 2001) was carried out using the R package randomForest,
Version 4.6.14 (Liaw and Wiener, 2002). Settings were used according to
(Rossel and Martínez Arbizu, 2018) (ntree=2,000, mtry=35,
sampsize=number of specimens in the smallest class). Classifications
were tested using the RF post-hoc test (Rossel and Martínez
Arbizu, 2018), function rf.post.hoc in package RFtools Version 0.0.3
(https://github.com/pmartinezarbizu/RFtools). Classifications were
tested for correct class assignment based on empirical assignment
probabilities of the RF model. Specimens with correct RF classification
and assignment probabilities not deviating significantly (p <
0.05) from the empirical distribution were considered true positive (tp)
assignments. Specimens with correct RF classification and significantly
different assignment probability were recorded as false positives (fp).
RF classification was applied to identification of the different
scyphozoan developmental stages to species level excluding all specimens
from the respective stage from the RF model. Also, classification of
both juvenile stages (polyp and ephyra) was tested with only adult
specimens (medusae) retained in the RF model. All RF models for
classification always contained all 23 species included in this study.
RF models, using developmental stages as classes, were used to find the
most important variables for differentiation of the groups using the
Gini index, which shows the degree of dissimilarity of the respective
variables (Han et al., 2016). T-distributed Stochastic Neighbor
Embedding (t-SNE) plots based on RF model-votes were created using R
package Rtsne Version 0.15 (Krijthe, 2015) with the following settings:
perplexity=10, max_iter=4,000 and theta=0. Principal Coordinate
Analysis was applied to the Hellinger-transformed data using the R
package ape, Version 5.5 (Paradis and Schliep, 2019).
Results
The hierarchical clustering carried out on the complete dataset of 278
specimens resulted in distinct, species-specific clusters for all 23
analyzed species (Fig.1) irrespective of sample storage time (
<1 to 111 months) and measuring campaign (supplementary table
1). A RF model based on all analyzed species resulted in an OOB error of
0 thus supporting species-specificity of mass spectra for all analyzed
species. Class-specific clusters (i.e. Anthozoa, Hydrozoa, Scyphozoa and
Staurozoa) have only been recovered for Scyphozoa (Fig. 1, branch
colors). Clustering of congeneric species was found inHaliclystus but not in Cyanea .
Interspecific Euclidean distances ranged from 0.99 to 1.38 (Fig. 2A),
while intraspecific differences reached a maximum of 1.35 in the
scyphozoan A. aurita when comparing a North Sea polyp and a North
Sea medusa. Lowest interspecific distances were recorded between two
scyphozoan species, a C. hysoscella medusa and an A.
aurita ephyra.
Although the factor species explained the majority of variance found in
the data, also different stages played an important role. According to
the tested factors (stage, region, number of peaks) the stages in a
PERMANOVA (Anderson, 2001) indicate a major percentage of variance
explained (17.0% in A. aurita , 27.1% in C. capillata ,
22.3% in C. lamarckii, Supplementary Table 3 – 5). Thus, groups
according to developmental stages can already be recognized in the
Hellinger-transformed processed raw data as presented in PCoAs forA. aurita (Fig. 3a), C. capillata (Fig. 3B) and C.
lamarckii (Fig. 3C). However, in some cases stages seem to be highly
similar for example for some ephyrae and polyps in C. lamarckii(Fig. 3C). This is also shown in a tSNE plot based on class votes from a
RF classification model trained using species-stages as classes (Fig.
3C, tSNE plot). Still the RF OOB error for all species on stage level
was 0 and most stages in all species are clearly separated in tSNE plots
(Fig. 3, tSNE plots). Accordingly, Euclidean distances within the
different stages are in the range of intra-specific distances (Fig. 2B).
Nevertheless, distances between stages within a species are distinctly
higher, but still lower than average inter-specific differences (Fig.
2B) allowing differentiation of stages also in classification
approaches.
Upon a more detailed investigation, distinct important peaks, identified
by Gini index within a RF model, can be found differentiated between
stages (Fig. 3, heat maps depicting peak intensities). Between some
stages, several peaks differ not only by the relative intensities but by
their presence and absence. In A. aurita the peak m/z 3447 is
absent in all polyp specimens but present in almost all other specimens
except for three medusae (Fig. 3A). Other peaks such as m/z 5352 differ
between stages mainly by differences in relative intensities, which are
in this case higher in polyps compared to other stages. This is also the
case for the peak m/z 4581 in C. capillata (Fig. 3B) which is
present in all specimens but clearly differs in relative intensities
between stages. Other peaks such as m/z 3314 are mainly present in
medusae but largely absent in the other stages. Even though in C.
lamarckii ephyrae and polyps are very similar in general, several peaks
are completely absent in the polyps but widely present in the ephyrae
and most intense in the medusae (m/z 2251, 2797 and 4733).
Tests on species identification of selected stages previously excluded
from the reference library resulted in 100 % successes (Fig. 4) with
only one exception: one C. lamarckii medusa was misclassified as
another species (H. digita ). All classifications were tested
using a post-hoc test. Whereas the majority of specimens were
correctly classified, some classifications were recognized as false
positives. In A. aurita, of the analyzed six polyps two were
recognized as false positives which may reflect that intraspecific
distances within this species were on average the highest between polyps
and the other stages. In C. lamarckii the lowest classification
and post-hoc test success was recorded in the medusae. Again,
these show the highest intraspecific distances to the remaining stages
in this species.
The analysis of mass spectra variability in ephyrae of the two
scyphozoan species A. aurita and C. capillata from NS and
BS origins revealed that intraspecific distances within one species from
a certain region are not distinctly different from intraspecific
differences in specimens from different regions (Fig. 2A). The Euclidean
distances within and between regions were on average lower than the
average distances between the different developmental stages (Fig. 2B),
demonstrating that the influence of the factor stage was higher than the
influence of the factor region. The R2 values of the PERMANOVA also
indicate a lower percentage of variance explained by regions in A.
aurita (4.2%, Supplementary Table 3) than in C. capillata(10.7%, Supplementary Table 5).
Discussion
Our study validates proteome fingerprinting as a promising tool for
identification of species across different classes of Cnidaria. The
success of the current study goes alongside other studies displaying the
high validity of this method for classification of metazoan taxa across
a variety of animal groups such as a collection of Arthropoda (Laakmann
et al., 2013; Rossel and Martínez Arbizu, 2019; Nabet et al., 2021;
Kürzel et al., 2022; Paulus et al., 2022), Mollusca (Wilke et al.,
2020), and Vertebrates (Mazzeo et al., 2008; Mazzeo and Siciliano, 2016;
Rossel et al., 2020) from marine, limnic and terrestrial realms. The
general applicability of MALDI-TOF MS for differentiation of cnidarian
species was shown before on three staurozoans from the North Sea (Holst
et al., 2019) and on nine siphonophores of the family Diphyidae (Park et
al., 2021). It was furthermore used in an integrative approach for the
differentiation of notoriously difficult to identify Hexa- and
Octocorallian species showing tendencies to delimit a species complex
that could not be resolved using the investigated molecular genetic
markers (Korfhage et al., 2022). In the present study, we confirmed that
MALDI-TOF MS is also applicable for the fragile gelatinous tissues of
hydrozoan and scyphozoan medusae with water contents of >
95 % (Arai, 1997). In addition, the method was tested using species
from a wider range of classes in one analysis for the first time. Also,
the effect of ontogenetic stage and environmental conditions on
proteomic spectra and in turn their impact on reliable species
identification in this taxon by MALDI-TOF MS was largely unknown so far.
Average intra- and
interspecific distances were clearly different as was also found in
studies on different groups of crustaceans (Renz et al., 2021; Paulus et
al., 2022). Our results demonstrate that intraspecific variability in
scyphozoans was strongly affected by the ontogenetic stage. It is
possible to differentiate within species on the level of ontogenetic
stage which was previously shown on calanoid copepods (Rossel et al.,
2022a). Although the benthic polyp and the pelagic ephyra / medusa
stages in metagenetic cnidarians represent two generations with very
different morphologies, the different stages formed clear species
clusters. However, stage-specific investigations in our study were
limited to the class Scyphozoa, and consequently, future studies should
also include polyp stages of metagenetic hydrozoan species.
Stage identification by MALDI-TOF MS in cnidarians is mainly of interest
if tissue samples of unknown stage (origin) are analyzed since the
identification of a certain stage in the life cycle of metagenetic
cnidarians is less challenging than the identification of ontogenetic
stages in other marine invertebrates as for example in copepods. Still,
knowledge on ontogenetic variation of spectra will be of high relevance
for defining the minimum requirements on stage-resolution in the applied
reference library, i.e. whether the inclusion of adult stages will be
sufficient to identify juvenile stages. Our results indicate that mass
peak variability between stages does usually not affect species level
classification. Even though there are differences between mass spectra
from the different stages, differences in the proteome among stages was
smaller than interspecific differences. Most specimens from certain
stages were still confidently classifiable on species level even if the
respective stage had been removed from the reference library. This may
allow identification of real samples using a partly incomplete reference
library concerning the different stages. For example, scyphozoan ephyra
and polyp stages which are difficult to identify to species level by
morphological methods (Holst, 2012) could be identified by MALDI-TOF MS
even if the reference library is based on adult medusae only. Problems
may occur in cases where between-stage distances are high as seen inC. lamarckii and C. hysoscella . Although in these cases,
RF classification success was high, the majority of classifications were
rejected by the post-hoc test based on assignment probabilities.
These classifications would therefore require a re-investigation by
morphology or genetic approaches (Rossel and Martínez Arbizu, 2018).
With a growing reference library, the amount of correct classifications
being recognized as false positives will most likely decrease.
Differences between
stages can be seen in a variety of mass peaks. Some are frequently found
to be present or absent in some stages. However, the majority of peaks,
also those driving the stage differences in the RF model, differ only in
relative mass-peak intensity. Thus, mass spectra rather seem to change
continuously with development than abruptly with the onset of the next
stage. Certain proteins or peptides may already be expressed before
transition to the next developmental stage or still be expressed after
transition and therefore be recorded in similar stages. This would be
comparable to some kind of intermoult status assumed to cause
misclassification in different stages of Calanus species
classified using MALDI-TOF MS (Rossel et al., 2022a).
Other factors influencing mass spectra variability may be environmental
differences impacting the physiology (Karger et al., 2019) and/or
underlying differences between populations (Müller et al., 2013;
Benkacimi et al., 2020). Mass spectra variability of the two scyphozoans
from the North Sea and the Baltic Sea were not as strongly influenced by
their sampling localities as was previously shown for copepods (Peters
et al., 2022). Specimens tend to cluster according to regions
(supplementary figure 1) but clusters also align with sampling and/or
measurement occasions. There was no effect of specimen origin on species
classification by the RF model. In other taxa, groupings based on
MALDI-TOF MS data according to sample location were found, for example
in calanoid copepods clustering according to origin lake (Riccardi et
al., 2012). Thus, it can be assumed that ecological factors influence
proteomic fingerprints, however the strength of the effect may depend on
taxon physiology. Copepods osmoregulate by changes of the osmolarity of
their hemolymph (Roddie et al., 1984; Lee et al., 2012) and changes in
protein expression were found under osmotic stress (DeBiasse et al.,
2018). Differences in salinity may have lower effects on the measured
proteome in osmoconformers (Rivera-Ingraham and Lignot, 2017). Although,
changes in salinity can impact cnidarian larval settlement and
reproduction (Glon et al., 2019; Dańko et al., 2020; Schäfer et al.,
2021), protein expression associated with osmoregulation may be less
affected since cnidarians are osmoconformers with only slight
differences in the osmolarities of sea water and the gelatinous tissues
(Wright and Purcell, 1997; Graham, 2001).
The fact that preserved specimens of the same species which were stored
for different time periods or were measured on different dates clustered
together, demonstrates the reliability of the approach. Previous studies
have shown that siphonophore tissues preserved more than one year were
still useful for MALDI-TOF MS (Park et al., 2021). Our results now
confirm that even cnidarian samples preserved much longer with storage
times up to 111 months can still be successfully used.
Moreover, the fact that scyphozoan specimens reared in laboratory
cultures clustered together with their conspecifics collected in the
field corroborates the assumption that the effects of environmental
factors on proteomic fingerprints are low in this taxon. This would
facilitate the applicability of the method since the creation of a
reference library for different environments would not be necessary.
However, verifying this demands investigation of further species from
other regions. To include considerations about variability based on
sample origin for future reference libraries, thoroughly planned
experiments should be carried out to investigate which ecological
factors have a major impact on mass-spectra variability.
From our results we conclude that proteomic fingerprinting is a reliable
method to differentiate and identify cnidarian species including
different scyphozoan life-history stages. Especially in the context of
identifying specimens deformed beyond recognition from samples fixed for
monitoring purposes, this time- and cost-effective method represents a
valid alternative method to molecular genetic identification tools.