Multispecies models for population dynamics: Progress,
challenges and future directions
Running title: Multispecies models for population dynamics
Jonatan F. Marquez1*, Stefan J.G.
Vriend1, Emily G. Simmonds1,2, Marie
V. Henriksen3, Lisa Sandal1, Marlène
Gamelon1,4, Christophe F.D. Coste1,
Knut Anders Hovstad1,5, Aline M.
Lee1
1 Centre for Biodiversity Dynamics,
Department of Biology, Norwegian University of Science and Technology,
7491 Trondheim, Norway.
2 Department of Mathematical Sciences, Norwegian
University of Science and Technology, 7491 Trondheim, Norway.
3 Department of Landscape and Biodiversity, Norwegian
Institute of Bioeconomy Research, 7031 Trondheim, Norway.
4 Université Lyon 1, CNRS, UMR 5558,
Laboratoire de Biométrie et Biologie Evolutive, 69622 Villeurbanne,
France
5The Norwegian Biodiversity Information Centre, 7446
Trondheim, Norway
Jonatan F. Marquez:
jonatan.fredricson@gmail.com
Stefan J.G. Vriend:
svriend@gmail.com
Emily G. Simmonds:
emilygsimmonds@gmail.com
Marie V. Henriksen: marie.henriksen@nibio.no
Lisa Sandal:
lisa.sandal@ntnu.no
Marlène Gamelon :
marlene.gamelon@univ-lyon1.fr
Christophe F.D. Coste:
christophe.f.d.coste@ntnu.no
Knut Anders Hovstad:
knut.hovstad@artsdatabanken.no
Aline M. Lee:
lee@alumni.ntnu.no
Keywords: Community, data integration, interspecific
interaction, functional group, hybrid model, latent variable, model
complexity, model uncertainty, population ecology, species associations.
Article type: Synthesis
Statement of authorship : AML conceived the research idea. All
authors (JFM, SJGV, EGS, MVH, LS, MG, CFDC, KAH, AML) contributed to the
literature search. AML and JFM reviewed and structured the findings. JFM
led the writing of the manuscript with contributions from all other
authors.
Data accessibility statement: No data was used.
Number of words in the main text: 7311
Number of words in the abstract: 164
Number of words in boxes: 264 and 261
Number of figures: 3
Number of boxes: 2
Number of references: 176
Corresponding author (*) : Jonatan F. Marquez, Centre for
Biodiversity Dynamics, Department of Biology, Norwegian University of
Science and Technology, 7491 Trondheim, Norway; Tel.: +47 93040892;
e-mail address:
jonatan.fredricson@gmail.com
Abstract
Understanding how population dynamics are influenced by species
interactions and the surrounding community is crucial for addressing
many ecological questions, but requires modelling of complex systems
involving direct, indirect and often asymmetric species interactions.
Progress in developing multispecies models that can tackle this task is
being made in multiple subfields of ecology, often with varying
approaches and end goals but also facing shared challenges. We review
some of the main challenges and the ways in which they are being
addressed, highlighting a wide variety of methods that can support the
development of multispecies models for understanding population
dynamics. The main challenges that we examine are estimation of species
interactions from limited data, the necessity of simplifications, and
handling uncertainty in complex, multispecies models. In addition to
reviewing a wide variety of approaches and methods for dealing with
these challenges, we discuss future directions and make suggestions for
how we believe the development of multispecies models for understanding
population dynamics can move forward more efficiently.
Introduction
Understanding how population dynamics are influenced by species
interactions and surrounding communities is crucial for our
understanding of the workings of whole communities and for our ability
to better predict the dynamics of individual species (Pimm 1982;
Marzloff et al. 2016). Over recent years, it has become apparent
that single-species population models are often not sufficient to
predict population dynamics in multispecies systems, and that management
decisions based on such models can have detrimental consequences (Kinzey
& Punt 2009; Legović & Geček 2010; Engelhardt et al. 2020).
Single-species models fail to adequately capture dynamics in many real
systems because communities are composed of complex networks of
interactions with continuous feedback effects. These feedbacks are
ignored in single-species models, even when abundances of interacting
species are included as covariates, limiting the realism and predictive
potential of the models (Kissling et al. 2012). These limitations
of single-species models are prompting an increased interest among
ecologists and managers in developing multispecies models that improve
understanding of the functioning of multispecies dynamics (Fultonet al. 2019), and that provide comprehensive information for
effective ecosystem management (Daan & Sissenwine 1990; Plagányiet al. 2014).
There are two main approaches to developing multispecies models. The
first approach builds up from the field of population ecology by joining
single-species models into multispecies frameworks, incorporating
species interactions. Conceptually, this approach can be traced back to
the classical deterministic Lotka-Volterramodels 11Note: Bold terms in the text are
explained in the Glossary and Multispecies model types glossary used
to describe the population dynamics of pairs of predator-prey or
competing species (Lotka 1925; Volterra 1928). These early models have
since provided a basis for more complex models of food webs and
competitors, incorporating several species and more realistic
characteristics such as spatial dynamics, environmental variability and
population structure (e.g. Roughgarden 1975; Holt & Lawton 1994;
Amarasekare 2008; Gamelon et al. 2019; Rüger et al. 2019;
Lee et al. 2020). In addition, several types of single-species
population models aimed at quantifying population abundance and
understanding drivers of population dynamics have recently been extended
to multispecies versions to improve the understanding of the roles of
interacting species (e.g. integrated population model s (Péron
& Koons 2012; Barraquand & Gimenez 2019), integral projection
model s (Adler et al. 2010; Kayal et al. 2018)), while the
use of simulation models such as individual based model s
(Breckling et al. 2005; DeAngelis & Grimm 2014; Grimm et
al. 2017) has also been shifted more towards multispecies modelling.
Because models built up from single-species population models handle
individual populations and their interactions explicitly, they tend to
require large amounts of data and ecological knowledge, generally
limiting their use to small subsets of species within a community.
The second major class of multispecies models focuses on understanding
systems at the community level, including attributes such as community
structure, biomass, energy flow, species richness and stability
(Ulanowicz 1972; Pimm 1982; Tarnecki et al. 2016). Because these
models focus on the dynamics of the community as a whole, they have
historically tended to treat species as interchangeable and have not
been concerned with the fate of specific species. Adding more species-
or population-level detail to these models is a second approach to
producing new multispecies models to understand population dynamics.
Currently, most community-level models are unsuitable as predictors of
population dynamics because they oversimplify population processes and
make broad assumptions about the systems under study (Hollowed et
al. 2000). However, they play an important role in identifying
knowledge gaps and interactions (Plagányi 2007; Travers et al.2007; Collie et al. 2016), and are often essential tools for
addressing macroecological issues, such as consequences of harvesting,
climate or habitat change (Pacifici et al. 2017) on community
dynamics. By adding more population-level detail and mechanisms to such
models, they can also be used to understand the dynamics of individual
populations within communities (García-Callejas et al. 2018).
Thus, we are seeing a shift towards multispecies modelling of population
dynamics, both by expanding single-species models to incorporate more
species and by adding more species-specific dynamics to community-level
models. One can imagine an ultimate goal of these two approaches meeting
in the middle, producing multispecies models that can both describe the
dynamics of individual species and capture the complexity of whole
communities. Recently, we have started to see hybrid modelsthat could provide a first step in such a development by embedding a
detailed population dynamic model within a community-level model
allowing information to flow back and forth between the two (e.g.
Breckling et al. 2005; Makler-Pick et al. 2011; Schmolkeet al. 2019). Such hybrid models have shown potential for
exploring ecological questions such as how ecosystem regime shifts
affect the dynamics of a particular species (Gray & Wotherspoon 2015;
Fulton et al. 2019), but they currently remain limited to simpler
systems due to their complexity and insufficient species-specific data
(Makler-Pick et al. 2011). Better collaboration between different
modelling disciplines could speed up progress on this front and unlock
more of the innate potential of hybrid modelling (Mokany et al.2016).
In parallel, some complex multispecies frameworks and software have been
developed for exploring specific questions, such as the effects of
alternative harvesting strategies and protected areas, mainly in a
fisheries context (Tjelmeland & Bogstad 1998; Pauly et al. 2000;
Begley & Howell 2004). These frameworks typically use a combination of
data, previously established parameter estimates and broadscale
assumptions about processes, and are presented as pre-packaged software
for exploring specific systems and questions. While such pre-packaged
fisheries model frameworks have great value for their use, their context
sets them somewhat apart from the more general modelling challenges
discussed in this paper. We will therefore not focus on these types of
model frameworks here, but refer our readers to the many reviews of
these that already exist in the literature (Hollowed et al. 2000;
Plagányi 2007; Collie et al. 2016).
In general, development of multispecies models incorporating population
dynamics and species interactions faces numerous challenges. Natural
systems are complex, containing many direct and indirect, often
asymmetric and environmentally sensitive, interactions (Montoya et
al. 2006; Morin 2011; Ovaskainen et al. 2017). This complexity,
and the resultant high data demands, represents a major obstacle to
achieving more realistic multispecies models capable of providing
accurate mechanistic understanding of ecological processes. However,
estimating the dynamics of a subset of species within a community
carries the risk of omitting important processes, introducing biases,
and limiting the usefulness of models for understanding population
dynamics and projecting them into the future (Fath et al. 2007).
In addition, uncertainty at different levels is propagated in complex
models, causing a trade-off between biological realism and parameter
certainty (Collie et al. 2016). It is therefore essential to find
ways to simplify natural systems in models while capturing the relevant
biological processes and minimizing the introduction of biases,
increased uncertainty and inaccurate conclusions (Essington 2004; Berlowet al. 2009).
The current shift towards multispecies modelling is taking place across
subfields of ecology simultaneously but often independently (Mooijet al. 2010). Consequently, many of the challenges inherent in
multispecies modelling, such as estimation of species interactions,
dealing with high model complexity or identification and measuring of
model uncertainty, are being addressed in different ways, with little
coordination (Mooij et al. 2010). In this paper, we review the
ways in which the major challenges of developing multispecies models and
estimating their parameters have been addressed in the literature. We
hope that by bringing together different methods and approaches across
fields, we can learn from each other and make faster progress towards
more realistic multispecies models for population dynamics. We discuss
constructive strategies to promote multispecies model development in
light of promising techniques, as well as challenges and limitations.
Estimating interactions
Accurate understanding and representation of interspecific
interactions is key for the development of robust multispecies models
but challenging because of the sheer number of potential
interactions in most natural systems and the many factors that can
influence them. The number of potentially interacting species pairs in a
system with S species is S(S-1)/2. However, many potential interactions
never occur due to differences in biological traits, such as morphology,
size and phenology (i.e., forbidden links; Olesen et al.2011; González-Varo & Traveset 2016). For example, nocturnal and
diurnal species do not coincide in time and some pollinator’s morphology
prevents them from reaching the nectar of certain flowers (Olesenet al. 2011), reducing the number of interactions to be estimated
for a given system. In contrast, other interactions are age-, sex- or
phenotype dependent, and special consideration of population structures
and complex life cycles are needed in the models, resulting in
additional data requirements and more interaction terms (Laffertyet al. 2008; Strauss et al. 2017; Gamelon et al.2019; Torres-Campos et al. 2020). Developing models that consider
variation in phenotypes, behaviours, and demography is important for
understanding changing interactions, and improving model projections and
predictions (González-Varo & Traveset 2016).
The number of realized interactions is further
influenced by local conditions affecting aspects of foraging biology and
resource utilization, such as relative abundances, species assemblages
and environmental conditions (Beckerman et al. 2006; Vázquezet al. 2009; Spiesman & Gratton 2016; Delmas et al.2019). For example, studies of simple arthropod food webs have shown
that more prey species do not necessarily increase the number of trophic
interactions because predators tend to focus on preferred prey while
ignoring the other (Torres-Campos et al. 2020). Similarly,
increasing the number of potentially competing pollinator species has
been shown to cause resource partitioning among pollinators, which can
lead to divergence of floral traits that benefit the main pollinating
species while hindering access to others (Temeles et al. 2016).
The dynamic nature of species interactions presents a challenge for the
development of realistic models aiming to capture large portions of
communities or extrapolate from one time and place to another
(Chamberlain et al. 2014; Gray & Wotherspoon 2015).
Several methods have been developed to identify and quantify species
interactions, with variable suitability depending on the species and
study system. Some methods assess species interactions directly, while
others infer interactions indirectly based on species
associations (i.e., inferred from species co-occurrences or correlated
density dynamics). Direct observations in the field can provide valuable
information about how species behave and interact under natural
environmental conditions that correlational and comparative studies
cannot. This can, however, be infeasible in species-rich systems and
when species and their interactions are rare, weak and/or elusive (e.g.
nocturnal, deep-sea, small or cryptic species) because the chance that
both species and their interaction occur when the observer is present is
low (Jordano 1987; Trijoulet et al. 2019). Manipulation
experiments (e.g. enclosure, exclosure, population augmentation
treatments) represent powerful tools to test hypotheses and better
understand the mechanistic relationships between species but they can be
costly and logistically challenging, especially when studying large or
highly mobile species (Schmitz 2004; Wood et al. 2019). Here, we
present additional alternative methods, discussing their potential and
challenges.
Diet analysis
Feeding plays a central role in many species’ interactions. Therefore,
methods to understand what and how organisms eat are important tools for
assessing interactions. Instead of observing an animal eat, information
about the diet of a species can be obtained by analysing ingested or
excreted material through stomach content and faecal analyses (Nielsenet al. 2018). Historically, these types of analyses have mostly
been done visually, providing valuable information about the prey, such
as size or life stage. However, visual examinations can also produce
biases, for example between hard and soft body prey, since soft tissues
dissolve faster (Nielsen et al. 2018). DNA analyses of the
stomach content and faeces can greatly improve the identification of
prey to species level, and are particularly useful when studying smaller
organisms, such as insects (Titulaer et al. 2017; Horswillet al. 2018; Curtsdotter et al. 2019), but are less useful
for identifying subgroups or traits within the prey species. The main
limitations of stomach content and faecal analyses are that they provide
diet information from relatively short periods (e.g. last foraging day),
and might provide little information if the animal has not fed recently
or if the prey’s DNA degrades during digestion (Russell et al.1992). Analyses of stable isotope ratios, mostly of nitrogen and carbon,
and other biomarkers provide information about assimilated materials,
thereby providing long-term dietary information from an individual
(Nielsen et al. 2018). Stable isotope ratio analyses can identify
a species’ trophic position and preferred foraging areas, but are often
unable to quantify prey to the species level (e.g. Vander Zandenet al. 1999; Blois et al. 2013). Because no diet tracing
method is bias free, there is a growing interest in developing methods
that combine different techniques to benefit from the advantages and
minimize the shortcomings (Nielsen et al. 2018). For example,
using information from visual stomach content analyses as priors in
Bayesian isotope mixing models allows accurate quantitative diet
estimates (Chiaradia et al. 2014). Similarly, combining stable
isotope, DNA and morphological analyses provides good estimates of prey
diversity and subtle changes in trophic levels, while minimizing
invasiveness and frequency of sampling (Horswill et al. 2018).
Time series correlations
A variety of methods have been developed to infer interactions
indirectly from other types of commonly available data, such as counts,
presence-absence data or species traits. When several species are
monitored at the same site, the changes in one species’ abundance or
demographic rates (e.g. survival, reproduction) can be related to the
population dynamics of the other species (Certain et al. 2018).
In some cases, the predation or competition pressure of a species on
others can be quantified by removing them in laboratory or field
experiments and measuring the changes in the dynamics of others (Pacala
& Silander 1990; Wilson & Tilman 1991; Wootton 2001). However,
removing a species from an ecosystem is seldom feasible. Therefore,
other methods classically used in demography can be used to infer
interactions indirectly through long-term demographic data collected at
the individual or population level, such as counts and
capture-mark-recapture (CMR) data. For example, by relating age-specific
breeding success and survival of chamois Rupicapra rupicapra to
annual population counts of cohabiting red deer Cervus elaphus ,
competition for food resources was found to negatively influence chamois
breeding success of primiparous and senescent females (Gamelon et
al. 2020). Thus, individual-based long-term monitoring programs running
on several interacting species at the same location can provide data to
estimate species interactions, even in the absence of detailed direct
observations of the interaction.
Joint species distribution models
Analogously, spatial patterns in species abundances can reveal
associations between species. Species distribution models(SDMs) were originally developed to infer species habitat preferences
from spatial abundance/occurrence data and environmental data (Kearney
& Porter 2009). More recently, SDMs have been developed that model
multiple species jointly (joint species distribution model s;
JSDMs). JSDMs assume that biotic interactions create non-random spatial
patterns in occurrence or abundance. Therefore, by first accounting for
the (dis)similarities in species’ responses to spatial environmental
patterns through an environmental covariance matrix, these models can
reveal species’ spatial associations from the residual covariance matrix
(Ovaskainen & Abrego 2020). How well these species associations
represent the true underlying species interactions in a JSDM depends on
whether the important environmental covariates have been included in the
model (Pollock et al. 2014; Dorazio et al. 2015;
Ovaskainen et al. 2017; Zurell et al. 2018). In addition,
because species associations can be influenced by common habitat
preferences or migration patterns, additional information about the
species, such as traits and phylogeny, are often used in combination
with spatial data to estimate the probability of an interaction
occurring (Morales-Castilla et al. 2015). Dormann et al.(2018) present a useful checklist to facilitate the interpretation of
such estimates and avoid major pitfalls.
As long-term spatial datasets become increasingly available, approaches
are also being developed that jointly account for spatial and temporal
dynamics to estimate species interactions (Schliep et al. 2018).
For example, using a multispecies competitive community dynamics
framework, time-series JSDM s are being used to infer species
interactions from species associations as functions of local species
abundances in previous years and local environmental conditions
(Mutshinda et al. 2011; Ovaskainen et al. 2017). By
combining temporal and spatial information, time-series JSDMs have a
high potential to provide more accurate estimates of underlying species
interactions, since observed covariations are based on multiple points
in time and space, rather than representing only a snapshot or a summary
of all the dynamics in a large region (Ives et al. 2003;
Ovaskainen et al. 2016, 2017; Thorson et al. 2016).
However, spatiotemporal models are also inherently more complex, which
makes them computationally challenging and less user-friendly (Norberget al. 2019). Moreover, as
with any inference method, model outputs are influenced by the amount,
quality and spatial structure of the data. It is therefore important to
always evaluate the results in light of ecological knowledge.
Trait-based approaches
Trait-based approaches have emerged in recent decades as useful
methods to study community dynamics by characterizing individuals or
species by key traits rather than as species with prescribed
interactions (McGill et al. 2006; Degen et al. 2018;
Kiørboe et al. 2018). Trait-based models have the potential to
describe more sophisticated communities since they can reduce the number
of parameters to be estimated (Kiørboe et al. 2018; Curtsdotteret al. 2019). Well-defined biological traits comparable across
species, such as body size, mobility, or defense strategy, may provide
useful information about an individual’s mortality, growth, metabolism,
and trophic role in a community. For example, larger competitors often
exert a dominant competitive pressure over smaller ones (Kohyama 1992),
predators preferentially feed on prey of a specific body mass relative
to their own (Brose 2010; Kalinkat et al. 2011) and
active-searching predators are more likely to encounter prey but likely
to attract predators, compared to passive ambush predators (Kiørboeet al. 2018). These generalizations allow initial
parameterization of interaction networks, even without direct data on
species interactions. Currently, the majority of trait-based approaches
are size-based (Kiørboe et al. 2018). Although the relationship
between body mass and ecological interaction is well supported across
many taxa (Pope et al. 2006; Hartvig et al. 2011; Boitet al. 2012; Schneider et al. 2012; Curtsdotter et
al. 2019), body size is not sufficient to describe the complex
interactions of many systems (Jonsson et al. 2018; Curtsdotteret al. 2019; Keppeler et al. 2020). Further examination of
key biological traits and how they interact with each other will
therefore help improve the development of trait-based approaches. These
general relationships seem unlikely to be accurate or common enough to
replace more targeted estimation of interaction strength. Nonetheless,
they can produce biologically plausible models (Brose 2010), and capture
major patterns of population dynamics in some systems, and can be a
useful starting point to help us fill in gaps in otherwise
well-parameterized models.
Functional responses
Mathematically formulating the effects of realized interactions in a
model of multispecies population dynamics entails estimating the
influence of one species’ population density on the population growth of
the other (i.e., functional response ). This function of
population density is for simplicity often assumed to be linear. While
linearity may adequately capture interspecific competition interactions,
they may be less suited for modelling interactions among trophic levels,
e.g. predator-prey dynamics (Certain et al. 2018). For example,
functional responses can take different shapes depending on how a
species searches, handles and processes prey, with potentially large
effects on population dynamics (Spalinger & Hobbs 1992; Koen-Alonso
2007; Castillo-Alvino & Marvá 2020). Additionally, functional responses
can vary between habitats and life stages, where a prey species might
itself be an important competitor or even predator on young individuals
of the predator species (Essington 2004), highlighting the importance of
considering population structure. While assuming simple functional
responses is a useful first step for multispecies models, it has been
argued that relying strongly on them could hinder advances towards
better mechanistic understanding of multispecies dynamics and limit
their projectability into the future (Hunsicker et al. 2011;
Kalinkat et al. 2011; Rosenbaum & Rall 2018). Assessing the
effects of several types of functional responses on model outputs is one
suggested solution, particularly because several functional response
types can sometimes result in similarly well-fitting models of empirical
data (Butterworth & Plagányi 2004; Koen-Alonso 2007; Kinzey & Punt
2009).
Data collection and
utilization
Sampling design and
technology
The development of multispecies models relies heavily on access to
high-quality data, but how can data collection be improved to help
estimation of species interactions and validation of multispecies
models? Ideally, long-term multispecies data sets for a wide number of
cohabiting species and environmental variables should be collected to
ensure that the population dynamics and interactions are well covered
under a wide range of conditions. In practice, the high costs and
logistic challenges of acquiring such data sets have limited their
availability to economically important species (e.g. harvested
communities; ICES 2019) or simpler and mostly self-contained communities
(e.g. islands, lakes; Christensen et al. 2013). However, studies
on the effectiveness of different sampling designs have highlighted
strategies through which data collection can be improved to the benefit
of multispecies modelling (Lahoz-Monfort et al. 2014; Trijouletet al. 2019; Zhang et al. 2020). In particular, there is
evidence that sampling a greater number of sites at low intensity gives
more representative system-wide estimates of interactions compared to
sampling fewer sites at high intensity (Bogstad et al. 1995;
Latour et al. 2003). Sampling a single site more rigorously
easily results in biases towards the interactions occurring in that
location, while sampling a greater range of sites gives a more
representative picture of species interactions across its range.
In addition to finding ways to improve the efficiency of sampling
strategies, recent technological developments have increased the
quantity and quality of data for studying species interactions.
Technologies that automate data collection, such as drones, GPS
trackers, movement sensors, video/audio recorders and image recognition
software (Weinstein 2015; Marvin et al. 2016), can reduce time
and costs, thereby allowing greater sampling coverages (e.g. number of
species, geographical area, higher spatial and temporal resolutions) and
increasing the probability of recording an interaction. Increased
environmental interest by the public has also led to the development of
citizen science and crowdsourcing initiatives that can provide
unprecedented amounts of data (Chandler et al. 2017; Devarajanet al. 2020). An example of this is the Global Biodiversity
Information Facility (GBIF) which as of April 2022 has more than 1.9
billion species occurrence observations publicly available. However,
citizen science data face limitations related to inconsistencies in
sampling effort, sampling biases, and errors (Zipkin & Saunders 2018).
Various statistical techniques are being developed and used to account
for these sampling issues by, for example, modelling random effect and
hierarchical structures (Kelling et al. 2015), but the majority
of such data remains unused or limited to broader macroecological
studies (Theobald et al. 2015; Heberling et al. 2021).
Data integration
As multispecies models tend to be data demanding, modelling methods that
can simultaneously include and take full advantage of a variety of data
sources are valuable. Such approaches have the potential to shorten the
time series required to provide good estimates, improve the
cost-effectiveness of monitoring programs, and improve the modelling of
data-poor species. For instance, Barraquand & Gimenez (2019) found that
combining data on capture-recapture, counts and reproduction to estimate
dynamics of interacting multi-stage populations using integrated
community models could provide accurate estimates of interactions,
while also requiring shorter time series than studies using only count
data. They evaluated the benefits of the different types of data to the
model results and the costs of collecting these, and concluded that
collecting reproduction data instead of capture-recapture data was a
more cost-effective strategy, especially for abundant species
(Barraquand & Gimenez 2019).
Integrating similar data types collected in different ways, such as
abundance data through camera traps and transects, or citizen science
and scientific surveys is a useful strategy in single-species population
modelling (Besbeas et al. 2002; Lee et al. 2015; Zipkin &
Saunders 2018; Isaac et al. 2020). Recent studies indicate that
the advantages inherent in data integration methods in single-species
models are also present in multispecies frameworks (Péron & Koons 2012;
Fithian et al. 2015; Lahoz-Monfort et al. 2017; Barraquand
& Gimenez 2019; Miller et al. 2019). Data integration methods
allow models to maximize the information extracted from each dataset,
while considering the weaknesses and strengths of each one (Milleret al. 2019). Similarly, combining data on similar species can
improve the estimates of each species individually (Lahoz-Monfortet al. 2017), and of data-poor species in particular (Fithianet al. 2015). For instance, using SDM, Fithian et al.(2015) found that when faced with presence-only data for a species,
using presence-only and presence-absence data from other species
facilitated information sharing across species, which improved
parameterization for the data-poor species by leveraging information
from closely connected species. Such data sharing is an additional
advantage of multispecies models over single species ones in many
systems (Kindsvater et al. 2018).
Model structure and
simplifications
We have discussed methods that are helpful for estimating or inferring
large numbers of species interactions for multispecies models, as well
as ways through which ecological data are becoming more detailed and
increasingly available. However, translating the high complexity of most
natural systems to models often leads to increased uncertainty, and
difficulty parameterizing and interpreting the results. Therefore, even
the most comprehensive multispecies models require some simplifications.
Dynamic multispecies population models have historically started as
simplified versions of the dynamics of a small subset of species (e.g.,
Lotka-Volterra model), onto which complexity was added in the form of,
for example, life stages, spatial dynamics, or environmental influence.
In contrast, community or network models aim to describe or understand
entire or large parts of an ecosystem and therefore need to simplify the
description of these communities. They tend to do this by finding ways
to reduce the number of interactions that need to be estimated
separately in the model without reducing model performance (Morin 2011;
Collie et al. 2016).
One way to reduce the number of interactions to be estimated is by
reducing the number of nodes in the model, i.e., the number of
community components (Fig. 1). Aggregating species into groups based on
taxonomic, trophic or/and ecological similarity (i.e.,trophospecies or functional groups ), can help the
development of simplified community models that cover a large proportion
of the community and yet maintain key properties of more
complex models (Hood et al. 2006; Ulanowicz et al. 2014;
Olivier & Planque 2017). This strategy has the added benefit of helping
to understand and make better predictions of rare and data-poor species
because it allows one to “borrow” information from common, closely
related species, or species with similar traits that are likely to
respond similarly to the environment, thereby increasing the sample size
used to estimate the parameters of the node. Similarly, sampling error
and stochasticity can have a smaller negative effect on the model
predictability when species are grouped (Agarwal et al. 2021).
However, since this approach regards various species as equal,
information about individual species and their dynamics is lost (Simmonset al. 2019). Thus, the model outputs become sensitive to the
criteria used to classify species, which is largely dependent on the
research question (Fath et al. 2007; Pacifici et al.2014). For example, a species classification based on taxonomical or
ecological similarities might be better suited for addressing impacts of
habitat change, while trophic similarities might be better suited for
modelling harvesting impacts and energy flows. It is also important to
assess the sensitivity of a model to different species classification
criteria because differences in classification methods (e.g. cluster
analyses, expert knowledge, model-based) can yield contrasting species
groupings and model results (Picard et al. 2012; Olivier &
Planque 2017).
Large communities can also be divided into subgroups or modulesbased on substructures within interaction networks (Olesen et al.2007; Dormann & Strauss 2014; Fig. 1). Modules represent recurring
non-random groupings of species within the community that interact more
with each other than with species from other modules (Olesen et
al. 2007). Identifying modules within ecosystems is therefore a good
strategy to find subsets of species that can be modelled independently
from the rest of the community (Allesina et al. 2005). Different
modules within a system can then be used as nodes of a coarser community
model. As a result of fewer interactions, the task of modelling large
communities becomes more manageable. Similarly, the number of
interactions can in some cases be reduced by identifying species with
weak interactions with the rest of the community and omitting them from
the model. For example, rare species are sometimes assumed to exert such
weak competition or predation pressure in relation to common or dominant
species that their influence is ignored (Canard et al. 2012).
However, extensive research of the system may be required before making
such assumptions (Terry & Lewis 2020). Weak interactions can increase
in importance over time (Terry et al. 2017) and, even if they remain
weak, can still be important for maintaining the structure and stability
of complex systems (Mccann et al. 1998). Removal of weak interactions
may therefore not always result in realistic model predictions.
Instead of grouping or omitting species to reduce the number of nodes
the same functions can sometimes be used to describe different
interactions and processes (e.g. growth, foraging, dispersal,
reproduction, competitiveness) while only varying their parameterization
to best represent each species (McDermot & Rose 2000; Reuter 2005;
Buchmann et al. 2011; Grimm & Berger 2016). For example, some
forest models use the same function to describe competitive interactions
(e.g. based on vertical leaf area distribution) and same growth
function, but adjust the growth parameters to each species (Kohyama
1992). Similarly, some models of fish communities assume trophic
interactions between fish to be size-dependent and species-independent,
thus, the same predation function can be used across species (Giacominiet al. 2013; González-Varo & Traveset 2016). This type of
simplification is often used in agent-based modelling, also
known as individual-based modelling (IBM) among ecologists when the
agents represent individuals. IBMs can assign the same biological
“behaviour” (i.e., growth model, interaction model, dispersal model,
etc.) to individuals from different species or groups of species and
efficiently simulate the complex dynamics of some interacting species
within a community (DeAngelis & Grimm 2014). It has also been argued
that, analogous to how modellers tend to use just a few well-established
functional responses, there could be a small subset of well-established
functions to describe other types of species behaviours that influence
interspecific dynamics, like foraging or home range, with
well-understood properties and requirements, thereby facilitating model
development and communication (Grimm & Berger 2016).
Latent variable approaches can also be a useful way to reduce
the dimensionality of multispecies models. Latent variables are
unobserved variables that can be used to represent the main axis of
(co)variation among species (Warton et al. 2015). For instance,
in JSDMs with latent variable structures, all pairs of species
associations or co-occurrences are modelled jointly by searching for the
leading axes of variation unaccounted for by the environmental effects.
This creates linear combinations of several variables, limiting the
dimensionality of the multispecies data (Thorson et al. 2015;
Ovaskainen & Abrego 2020; van der Veen et al. 2021). Instructural equation modelling (SEM), latent variables typically
represent a theorized environmental effect measured by one or more
indicator variables (Grace et al. 2010). In Bayesian network
analysis, latent variables are used to group nodes with similar roles in
the network and can thereby reduce the complexity of the modelled system
(Kim et al. 2018). Latent variables can also be used to estimate
interaction probabilities where nodes with similar latent positions in
the network structure are assumed to be more likely to interact (Rohret al. 2016; Kim et al. 2018).
While simplifications are a useful and necessary part of modelling,
oversimplifications can lead to poorer model performance and loss of
predictive power, especially under changing conditions (Raick et
al. 2006; Berlow et al. 2009). Ideally, one would always compare
the simplified models to more complete and complex models to assess
their effectiveness and accuracy, as well as to identify the trade-offs
of the simplifications (Raick et al. 2006). However, that would
entail having abundant data to develop the complex models first, which
is usually not an option. In practice, decisions on model structure and
simplifications are often based on data availability instead of robust
knowledge about ecological functionality (see e.g. Lafferty et
al. 2008; Dunn et al. 2017). Because such decisions will
continue to be necessary, especially in data-poor studies, it is
important that the simplification methods, (e.g. criteria used to
aggregate species or standardize links) are systematically documented to
facilitate comparative studies that help highlight the strengths and
weaknesses of each approach (Olivier & Planque 2017). We must not
overlook the importance of having a robust understanding of the
individual building blocks of natural ecosystems even if we aim to model
whole systems (Koen-Alonso 2007). In the long run, this will promote the
development of more encompassing and realistic models, while enabling us
to limit their complexity and data requirements through ecologically
grounded simplifications.
Dealing with uncertainty
Uncertainty is a feature of all statistical and mathematical models that
result from simplifying natural processes using imperfect data to
estimate unknown processes (Berlow et al. 2009). Epistemic or systematic
uncertainty (Regan et al. 2002) enters the modelling process because of
(1) errors in data or insufficient data, (2) random and non-random
variation in nature, and (3) assumptions and simplifications about the
parameters and model structure (Regan et al. 2002; Walkeret al. 2003; Koo et al. 2017; Fig. 2). While many of these
sources of uncertainty are shared with single-species models, the
greater complexity in multispecies models complicates the task of
quantifying their effects, increases the number of pathways through
which uncertainty can propagate and increases their potential influence
on the overall model output (Zhang et al. 2015). It is therefore
crucial that sources of uncertainty are identified, quantified and
reported in any multispecies model.
Uncertainty in population-specific data can be accounted for through
techniques derived from single-species frameworks, such as observation
models that estimate sampling error. However, as multispecies models
require more diverse data, accounting for measurement errors and/or
systematic biases associated with all the data sources becomes more
challenging (Regan et al. 2002). For instance, estimation of
species interactions often requires sampling multiple sources (i.e.,
several species) and types of data (e.g., interaction types,
frequencies) simultaneously, each with some degree of sampling error.
Also, species interactions are expected to be influenced by
environmental variables, which are themselves estimated with some degree
of uncertainty (Koo et al. 2017). Similarly, because species
interactions are sometimes estimated indirectly within the models (e.g.,
Ovaskainen et al. 2016), interaction estimates become model
outputs and subject to additional uncertainty. Expanding datasets and
improvements in measuring and identification techniques have great
potential to reduce the degree of uncertainty in the data and make
models less sensitive to prior assumptions (Cressie et al. 2009).
However, obtaining more and better data is still limited by logistical
challenges (Zhang et al. 2015). This is another reason why
diversifying the types of independent data collected can be beneficial,
as different model parameters can be informed by multiple data sources
(e.g., fecundity, census, mortality-at-age) simultaneously in a single
framework (Kindsvater et al. 2018). Identification and
propagation of data uncertainty through the modelling process, and
critical assessments of the conditions under which the models are useful
are important to minimize errors, biases and misleading projections
(Wells & O’Hara 2013; Certain et al. 2018; Engelhardt et
al. 2020).
As mentioned in previous sections, multispecies models often have to
rely on inferred interactions or researchers’ assumptions about the
processes giving rise to the observed data (Milner-Gulland & Shea
2017). However, different assumptions lead to different results. This
source of uncertainty can be particularly difficult to quantify because
the resulting measurements of uncertainty associated with the likelihood
function of the model do not inform about the correctness of the model,
but about the certainty in the parameters, already assuming the model
structure is true (Kinzey & Punt 2009). Instead, this structural
uncertainty can be accounted for, quantified or reduced through model
comparison, model averaging, or validation of predictions
(Regan et al. 2002; Koo et al. 2017).
Uncertainty cannot be reduced to zero (Milner-Gulland & Shea 2017), but
we can explore ways to minimize uncertainty and report it transparently.
Recognizing and quantifying all sources of uncertainty is essential for
evaluating model usefulness and identifying model weaknesses for future
research (Zhang et al. 2015). Acknowledging uncertainty can also
improve models directly. For example, including prior knowledge about
ecological preferences in multispecies SDMs as uncertain, instead of
fixed, has been shown to improve both predictability and accuracy
(Vermeiren et al. 2020). However, recognition and analysis of
uncertainty in multispecies models has received relatively little
attention, a lack that has been argued to be a major hindrance for the
use of multispecies models in management contexts (Thorpe et al.2015). Development of methods to consistently quantify and reduce
uncertainty in multispecies models is therefore important going forward
and should happen simultaneously with the development of the models
themselves. This will ensure that we have the necessary toolkit to
achieve usable model outputs and give biologically meaningful insights
that can be used in management and conservation contexts. This will also
help us to identify parts of the community or sampling designs
associated with higher uncertainty, providing a simple way to improve
data collection.