Multispecies models for population dynamics: Progress, challenges and future directions

Running title: Multispecies models for population dynamics
Jonatan F. Marquez1*, Stefan J.G. Vriend1, Emily G. Simmonds1,2, Marie V. Henriksen3, Lisa Sandal1, Marlène Gamelon1,4, Christophe F.D. Coste1, Knut Anders Hovstad1,5, Aline M. Lee1
1 Centre for Biodiversity Dynamics, Department of Biology, Norwegian University of Science and Technology, 7491 Trondheim, Norway.
2 Department of Mathematical Sciences, Norwegian University of Science and Technology, 7491 Trondheim, Norway.
3 Department of Landscape and Biodiversity, Norwegian Institute of Bioeconomy Research, 7031 Trondheim, Norway.
4 Université Lyon 1, CNRS, UMR 5558, Laboratoire de Biométrie et Biologie Evolutive, 69622 Villeurbanne, France
5The Norwegian Biodiversity Information Centre, 7446 Trondheim, Norway
Jonatan F. Marquez: jonatan.fredricson@gmail.com
Stefan J.G. Vriend: svriend@gmail.com
Emily G. Simmonds: emilygsimmonds@gmail.com
Marie V. Henriksen: marie.henriksen@nibio.no
Lisa Sandal: lisa.sandal@ntnu.no
Marlène Gamelon : marlene.gamelon@univ-lyon1.fr
Christophe F.D. Coste: christophe.f.d.coste@ntnu.no
Knut Anders Hovstad: knut.hovstad@artsdatabanken.no
Aline M. Lee: lee@alumni.ntnu.no
Keywords: Community, data integration, interspecific interaction, functional group, hybrid model, latent variable, model complexity, model uncertainty, population ecology, species associations.
Article type: Synthesis
Statement of authorship : AML conceived the research idea. All authors (JFM, SJGV, EGS, MVH, LS, MG, CFDC, KAH, AML) contributed to the literature search. AML and JFM reviewed and structured the findings. JFM led the writing of the manuscript with contributions from all other authors.
Data accessibility statement: No data was used.
Number of words in the main text: 7311
Number of words in the abstract: 164
Number of words in boxes: 264 and 261
Number of figures: 3
Number of boxes: 2
Number of references: 176
Corresponding author (*) : Jonatan F. Marquez, Centre for Biodiversity Dynamics, Department of Biology, Norwegian University of Science and Technology, 7491 Trondheim, Norway; Tel.: +47 93040892; e-mail address: jonatan.fredricson@gmail.com

Abstract

Understanding how population dynamics are influenced by species interactions and the surrounding community is crucial for addressing many ecological questions, but requires modelling of complex systems involving direct, indirect and often asymmetric species interactions. Progress in developing multispecies models that can tackle this task is being made in multiple subfields of ecology, often with varying approaches and end goals but also facing shared challenges. We review some of the main challenges and the ways in which they are being addressed, highlighting a wide variety of methods that can support the development of multispecies models for understanding population dynamics. The main challenges that we examine are estimation of species interactions from limited data, the necessity of simplifications, and handling uncertainty in complex, multispecies models. In addition to reviewing a wide variety of approaches and methods for dealing with these challenges, we discuss future directions and make suggestions for how we believe the development of multispecies models for understanding population dynamics can move forward more efficiently.

Introduction

Understanding how population dynamics are influenced by species interactions and surrounding communities is crucial for our understanding of the workings of whole communities and for our ability to better predict the dynamics of individual species (Pimm 1982; Marzloff et al. 2016). Over recent years, it has become apparent that single-species population models are often not sufficient to predict population dynamics in multispecies systems, and that management decisions based on such models can have detrimental consequences (Kinzey & Punt 2009; Legović & Geček 2010; Engelhardt et al. 2020). Single-species models fail to adequately capture dynamics in many real systems because communities are composed of complex networks of interactions with continuous feedback effects. These feedbacks are ignored in single-species models, even when abundances of interacting species are included as covariates, limiting the realism and predictive potential of the models (Kissling et al. 2012). These limitations of single-species models are prompting an increased interest among ecologists and managers in developing multispecies models that improve understanding of the functioning of multispecies dynamics (Fultonet al. 2019), and that provide comprehensive information for effective ecosystem management (Daan & Sissenwine 1990; Plagányiet al. 2014).
There are two main approaches to developing multispecies models. The first approach builds up from the field of population ecology by joining single-species models into multispecies frameworks, incorporating species interactions. Conceptually, this approach can be traced back to the classical deterministic Lotka-Volterramodels 11Note: Bold terms in the text are explained in the Glossary and Multispecies model types glossary used to describe the population dynamics of pairs of predator-prey or competing species (Lotka 1925; Volterra 1928). These early models have since provided a basis for more complex models of food webs and competitors, incorporating several species and more realistic characteristics such as spatial dynamics, environmental variability and population structure (e.g. Roughgarden 1975; Holt & Lawton 1994; Amarasekare 2008; Gamelon et al. 2019; Rüger et al. 2019; Lee et al. 2020). In addition, several types of single-species population models aimed at quantifying population abundance and understanding drivers of population dynamics have recently been extended to multispecies versions to improve the understanding of the roles of interacting species (e.g. integrated population model s (Péron & Koons 2012; Barraquand & Gimenez 2019), integral projection model s (Adler et al. 2010; Kayal et al. 2018)), while the use of simulation models such as individual based model s (Breckling et al. 2005; DeAngelis & Grimm 2014; Grimm et al. 2017) has also been shifted more towards multispecies modelling. Because models built up from single-species population models handle individual populations and their interactions explicitly, they tend to require large amounts of data and ecological knowledge, generally limiting their use to small subsets of species within a community.
The second major class of multispecies models focuses on understanding systems at the community level, including attributes such as community structure, biomass, energy flow, species richness and stability (Ulanowicz 1972; Pimm 1982; Tarnecki et al. 2016). Because these models focus on the dynamics of the community as a whole, they have historically tended to treat species as interchangeable and have not been concerned with the fate of specific species. Adding more species- or population-level detail to these models is a second approach to producing new multispecies models to understand population dynamics. Currently, most community-level models are unsuitable as predictors of population dynamics because they oversimplify population processes and make broad assumptions about the systems under study (Hollowed et al. 2000). However, they play an important role in identifying knowledge gaps and interactions (Plagányi 2007; Travers et al.2007; Collie et al. 2016), and are often essential tools for addressing macroecological issues, such as consequences of harvesting, climate or habitat change (Pacifici et al. 2017) on community dynamics. By adding more population-level detail and mechanisms to such models, they can also be used to understand the dynamics of individual populations within communities (García-Callejas et al. 2018).
Thus, we are seeing a shift towards multispecies modelling of population dynamics, both by expanding single-species models to incorporate more species and by adding more species-specific dynamics to community-level models. One can imagine an ultimate goal of these two approaches meeting in the middle, producing multispecies models that can both describe the dynamics of individual species and capture the complexity of whole communities. Recently, we have started to see hybrid modelsthat could provide a first step in such a development by embedding a detailed population dynamic model within a community-level model allowing information to flow back and forth between the two (e.g. Breckling et al. 2005; Makler-Pick et al. 2011; Schmolkeet al. 2019). Such hybrid models have shown potential for exploring ecological questions such as how ecosystem regime shifts affect the dynamics of a particular species (Gray & Wotherspoon 2015; Fulton et al. 2019), but they currently remain limited to simpler systems due to their complexity and insufficient species-specific data (Makler-Pick et al. 2011). Better collaboration between different modelling disciplines could speed up progress on this front and unlock more of the innate potential of hybrid modelling (Mokany et al.2016).
In parallel, some complex multispecies frameworks and software have been developed for exploring specific questions, such as the effects of alternative harvesting strategies and protected areas, mainly in a fisheries context (Tjelmeland & Bogstad 1998; Pauly et al. 2000; Begley & Howell 2004). These frameworks typically use a combination of data, previously established parameter estimates and broadscale assumptions about processes, and are presented as pre-packaged software for exploring specific systems and questions. While such pre-packaged fisheries model frameworks have great value for their use, their context sets them somewhat apart from the more general modelling challenges discussed in this paper. We will therefore not focus on these types of model frameworks here, but refer our readers to the many reviews of these that already exist in the literature (Hollowed et al. 2000; Plagányi 2007; Collie et al. 2016).
In general, development of multispecies models incorporating population dynamics and species interactions faces numerous challenges. Natural systems are complex, containing many direct and indirect, often asymmetric and environmentally sensitive, interactions (Montoya et al. 2006; Morin 2011; Ovaskainen et al. 2017). This complexity, and the resultant high data demands, represents a major obstacle to achieving more realistic multispecies models capable of providing accurate mechanistic understanding of ecological processes. However, estimating the dynamics of a subset of species within a community carries the risk of omitting important processes, introducing biases, and limiting the usefulness of models for understanding population dynamics and projecting them into the future (Fath et al. 2007). In addition, uncertainty at different levels is propagated in complex models, causing a trade-off between biological realism and parameter certainty (Collie et al. 2016). It is therefore essential to find ways to simplify natural systems in models while capturing the relevant biological processes and minimizing the introduction of biases, increased uncertainty and inaccurate conclusions (Essington 2004; Berlowet al. 2009).
The current shift towards multispecies modelling is taking place across subfields of ecology simultaneously but often independently (Mooijet al. 2010). Consequently, many of the challenges inherent in multispecies modelling, such as estimation of species interactions, dealing with high model complexity or identification and measuring of model uncertainty, are being addressed in different ways, with little coordination (Mooij et al. 2010). In this paper, we review the ways in which the major challenges of developing multispecies models and estimating their parameters have been addressed in the literature. We hope that by bringing together different methods and approaches across fields, we can learn from each other and make faster progress towards more realistic multispecies models for population dynamics. We discuss constructive strategies to promote multispecies model development in light of promising techniques, as well as challenges and limitations.

Estimating interactions

Accurate understanding and representation of interspecific interactions is key for the development of robust multispecies models but challenging because of the sheer number of potential interactions in most natural systems and the many factors that can influence them. The number of potentially interacting species pairs in a system with S species is S(S-1)/2. However, many potential interactions never occur due to differences in biological traits, such as morphology, size and phenology (i.e., forbidden links; Olesen et al.2011; González-Varo & Traveset 2016). For example, nocturnal and diurnal species do not coincide in time and some pollinator’s morphology prevents them from reaching the nectar of certain flowers (Olesenet al. 2011), reducing the number of interactions to be estimated for a given system. In contrast, other interactions are age-, sex- or phenotype dependent, and special consideration of population structures and complex life cycles are needed in the models, resulting in additional data requirements and more interaction terms (Laffertyet al. 2008; Strauss et al. 2017; Gamelon et al.2019; Torres-Campos et al. 2020). Developing models that consider variation in phenotypes, behaviours, and demography is important for understanding changing interactions, and improving model projections and predictions (González-Varo & Traveset 2016).
The number of realized interactions is further influenced by local conditions affecting aspects of foraging biology and resource utilization, such as relative abundances, species assemblages and environmental conditions (Beckerman et al. 2006; Vázquezet al. 2009; Spiesman & Gratton 2016; Delmas et al.2019). For example, studies of simple arthropod food webs have shown that more prey species do not necessarily increase the number of trophic interactions because predators tend to focus on preferred prey while ignoring the other (Torres-Campos et al. 2020). Similarly, increasing the number of potentially competing pollinator species has been shown to cause resource partitioning among pollinators, which can lead to divergence of floral traits that benefit the main pollinating species while hindering access to others (Temeles et al. 2016). The dynamic nature of species interactions presents a challenge for the development of realistic models aiming to capture large portions of communities or extrapolate from one time and place to another (Chamberlain et al. 2014; Gray & Wotherspoon 2015).
Several methods have been developed to identify and quantify species interactions, with variable suitability depending on the species and study system. Some methods assess species interactions directly, while others infer interactions indirectly based on species associations (i.e., inferred from species co-occurrences or correlated density dynamics). Direct observations in the field can provide valuable information about how species behave and interact under natural environmental conditions that correlational and comparative studies cannot. This can, however, be infeasible in species-rich systems and when species and their interactions are rare, weak and/or elusive (e.g. nocturnal, deep-sea, small or cryptic species) because the chance that both species and their interaction occur when the observer is present is low (Jordano 1987; Trijoulet et al. 2019). Manipulation experiments (e.g. enclosure, exclosure, population augmentation treatments) represent powerful tools to test hypotheses and better understand the mechanistic relationships between species but they can be costly and logistically challenging, especially when studying large or highly mobile species (Schmitz 2004; Wood et al. 2019). Here, we present additional alternative methods, discussing their potential and challenges.

Diet analysis

Feeding plays a central role in many species’ interactions. Therefore, methods to understand what and how organisms eat are important tools for assessing interactions. Instead of observing an animal eat, information about the diet of a species can be obtained by analysing ingested or excreted material through stomach content and faecal analyses (Nielsenet al. 2018). Historically, these types of analyses have mostly been done visually, providing valuable information about the prey, such as size or life stage. However, visual examinations can also produce biases, for example between hard and soft body prey, since soft tissues dissolve faster (Nielsen et al. 2018). DNA analyses of the stomach content and faeces can greatly improve the identification of prey to species level, and are particularly useful when studying smaller organisms, such as insects (Titulaer et al. 2017; Horswillet al. 2018; Curtsdotter et al. 2019), but are less useful for identifying subgroups or traits within the prey species. The main limitations of stomach content and faecal analyses are that they provide diet information from relatively short periods (e.g. last foraging day), and might provide little information if the animal has not fed recently or if the prey’s DNA degrades during digestion (Russell et al.1992). Analyses of stable isotope ratios, mostly of nitrogen and carbon, and other biomarkers provide information about assimilated materials, thereby providing long-term dietary information from an individual (Nielsen et al. 2018). Stable isotope ratio analyses can identify a species’ trophic position and preferred foraging areas, but are often unable to quantify prey to the species level (e.g. Vander Zandenet al. 1999; Blois et al. 2013). Because no diet tracing method is bias free, there is a growing interest in developing methods that combine different techniques to benefit from the advantages and minimize the shortcomings (Nielsen et al. 2018). For example, using information from visual stomach content analyses as priors in Bayesian isotope mixing models allows accurate quantitative diet estimates (Chiaradia et al. 2014). Similarly, combining stable isotope, DNA and morphological analyses provides good estimates of prey diversity and subtle changes in trophic levels, while minimizing invasiveness and frequency of sampling (Horswill et al. 2018).

Time series correlations

A variety of methods have been developed to infer interactions indirectly from other types of commonly available data, such as counts, presence-absence data or species traits. When several species are monitored at the same site, the changes in one species’ abundance or demographic rates (e.g. survival, reproduction) can be related to the population dynamics of the other species (Certain et al. 2018). In some cases, the predation or competition pressure of a species on others can be quantified by removing them in laboratory or field experiments and measuring the changes in the dynamics of others (Pacala & Silander 1990; Wilson & Tilman 1991; Wootton 2001). However, removing a species from an ecosystem is seldom feasible. Therefore, other methods classically used in demography can be used to infer interactions indirectly through long-term demographic data collected at the individual or population level, such as counts and capture-mark-recapture (CMR) data. For example, by relating age-specific breeding success and survival of chamois Rupicapra rupicapra to annual population counts of cohabiting red deer Cervus elaphus , competition for food resources was found to negatively influence chamois breeding success of primiparous and senescent females (Gamelon et al. 2020). Thus, individual-based long-term monitoring programs running on several interacting species at the same location can provide data to estimate species interactions, even in the absence of detailed direct observations of the interaction.

Joint species distribution models

Analogously, spatial patterns in species abundances can reveal associations between species. Species distribution models(SDMs) were originally developed to infer species habitat preferences from spatial abundance/occurrence data and environmental data (Kearney & Porter 2009). More recently, SDMs have been developed that model multiple species jointly (joint species distribution model s; JSDMs). JSDMs assume that biotic interactions create non-random spatial patterns in occurrence or abundance. Therefore, by first accounting for the (dis)similarities in species’ responses to spatial environmental patterns through an environmental covariance matrix, these models can reveal species’ spatial associations from the residual covariance matrix (Ovaskainen & Abrego 2020). How well these species associations represent the true underlying species interactions in a JSDM depends on whether the important environmental covariates have been included in the model (Pollock et al. 2014; Dorazio et al. 2015; Ovaskainen et al. 2017; Zurell et al. 2018). In addition, because species associations can be influenced by common habitat preferences or migration patterns, additional information about the species, such as traits and phylogeny, are often used in combination with spatial data to estimate the probability of an interaction occurring (Morales-Castilla et al. 2015). Dormann et al.(2018) present a useful checklist to facilitate the interpretation of such estimates and avoid major pitfalls.
As long-term spatial datasets become increasingly available, approaches are also being developed that jointly account for spatial and temporal dynamics to estimate species interactions (Schliep et al. 2018). For example, using a multispecies competitive community dynamics framework, time-series JSDM s are being used to infer species interactions from species associations as functions of local species abundances in previous years and local environmental conditions (Mutshinda et al. 2011; Ovaskainen et al. 2017). By combining temporal and spatial information, time-series JSDMs have a high potential to provide more accurate estimates of underlying species interactions, since observed covariations are based on multiple points in time and space, rather than representing only a snapshot or a summary of all the dynamics in a large region (Ives et al. 2003; Ovaskainen et al. 2016, 2017; Thorson et al. 2016). However, spatiotemporal models are also inherently more complex, which makes them computationally challenging and less user-friendly (Norberget al. 2019). Moreover, as with any inference method, model outputs are influenced by the amount, quality and spatial structure of the data. It is therefore important to always evaluate the results in light of ecological knowledge.

Trait-based approaches

Trait-based approaches have emerged in recent decades as useful methods to study community dynamics by characterizing individuals or species by key traits rather than as species with prescribed interactions (McGill et al. 2006; Degen et al. 2018; Kiørboe et al. 2018). Trait-based models have the potential to describe more sophisticated communities since they can reduce the number of parameters to be estimated (Kiørboe et al. 2018; Curtsdotteret al. 2019). Well-defined biological traits comparable across species, such as body size, mobility, or defense strategy, may provide useful information about an individual’s mortality, growth, metabolism, and trophic role in a community. For example, larger competitors often exert a dominant competitive pressure over smaller ones (Kohyama 1992), predators preferentially feed on prey of a specific body mass relative to their own (Brose 2010; Kalinkat et al. 2011) and active-searching predators are more likely to encounter prey but likely to attract predators, compared to passive ambush predators (Kiørboeet al. 2018). These generalizations allow initial parameterization of interaction networks, even without direct data on species interactions. Currently, the majority of trait-based approaches are size-based (Kiørboe et al. 2018). Although the relationship between body mass and ecological interaction is well supported across many taxa (Pope et al. 2006; Hartvig et al. 2011; Boitet al. 2012; Schneider et al. 2012; Curtsdotter et al. 2019), body size is not sufficient to describe the complex interactions of many systems (Jonsson et al. 2018; Curtsdotteret al. 2019; Keppeler et al. 2020). Further examination of key biological traits and how they interact with each other will therefore help improve the development of trait-based approaches. These general relationships seem unlikely to be accurate or common enough to replace more targeted estimation of interaction strength. Nonetheless, they can produce biologically plausible models (Brose 2010), and capture major patterns of population dynamics in some systems, and can be a useful starting point to help us fill in gaps in otherwise well-parameterized models.

Functional responses

Mathematically formulating the effects of realized interactions in a model of multispecies population dynamics entails estimating the influence of one species’ population density on the population growth of the other (i.e., functional response ). This function of population density is for simplicity often assumed to be linear. While linearity may adequately capture interspecific competition interactions, they may be less suited for modelling interactions among trophic levels, e.g. predator-prey dynamics (Certain et al. 2018). For example, functional responses can take different shapes depending on how a species searches, handles and processes prey, with potentially large effects on population dynamics (Spalinger & Hobbs 1992; Koen-Alonso 2007; Castillo-Alvino & Marvá 2020). Additionally, functional responses can vary between habitats and life stages, where a prey species might itself be an important competitor or even predator on young individuals of the predator species (Essington 2004), highlighting the importance of considering population structure. While assuming simple functional responses is a useful first step for multispecies models, it has been argued that relying strongly on them could hinder advances towards better mechanistic understanding of multispecies dynamics and limit their projectability into the future (Hunsicker et al. 2011; Kalinkat et al. 2011; Rosenbaum & Rall 2018). Assessing the effects of several types of functional responses on model outputs is one suggested solution, particularly because several functional response types can sometimes result in similarly well-fitting models of empirical data (Butterworth & Plagányi 2004; Koen-Alonso 2007; Kinzey & Punt 2009).

Data collection and utilization

Sampling design and technology

The development of multispecies models relies heavily on access to high-quality data, but how can data collection be improved to help estimation of species interactions and validation of multispecies models? Ideally, long-term multispecies data sets for a wide number of cohabiting species and environmental variables should be collected to ensure that the population dynamics and interactions are well covered under a wide range of conditions. In practice, the high costs and logistic challenges of acquiring such data sets have limited their availability to economically important species (e.g. harvested communities; ICES 2019) or simpler and mostly self-contained communities (e.g. islands, lakes; Christensen et al. 2013). However, studies on the effectiveness of different sampling designs have highlighted strategies through which data collection can be improved to the benefit of multispecies modelling (Lahoz-Monfort et al. 2014; Trijouletet al. 2019; Zhang et al. 2020). In particular, there is evidence that sampling a greater number of sites at low intensity gives more representative system-wide estimates of interactions compared to sampling fewer sites at high intensity (Bogstad et al. 1995; Latour et al. 2003). Sampling a single site more rigorously easily results in biases towards the interactions occurring in that location, while sampling a greater range of sites gives a more representative picture of species interactions across its range.
In addition to finding ways to improve the efficiency of sampling strategies, recent technological developments have increased the quantity and quality of data for studying species interactions. Technologies that automate data collection, such as drones, GPS trackers, movement sensors, video/audio recorders and image recognition software (Weinstein 2015; Marvin et al. 2016), can reduce time and costs, thereby allowing greater sampling coverages (e.g. number of species, geographical area, higher spatial and temporal resolutions) and increasing the probability of recording an interaction. Increased environmental interest by the public has also led to the development of citizen science and crowdsourcing initiatives that can provide unprecedented amounts of data (Chandler et al. 2017; Devarajanet al. 2020). An example of this is the Global Biodiversity Information Facility (GBIF) which as of April 2022 has more than 1.9 billion species occurrence observations publicly available. However, citizen science data face limitations related to inconsistencies in sampling effort, sampling biases, and errors (Zipkin & Saunders 2018). Various statistical techniques are being developed and used to account for these sampling issues by, for example, modelling random effect and hierarchical structures (Kelling et al. 2015), but the majority of such data remains unused or limited to broader macroecological studies (Theobald et al. 2015; Heberling et al. 2021).

Data integration

As multispecies models tend to be data demanding, modelling methods that can simultaneously include and take full advantage of a variety of data sources are valuable. Such approaches have the potential to shorten the time series required to provide good estimates, improve the cost-effectiveness of monitoring programs, and improve the modelling of data-poor species. For instance, Barraquand & Gimenez (2019) found that combining data on capture-recapture, counts and reproduction to estimate dynamics of interacting multi-stage populations using integrated community models could provide accurate estimates of interactions, while also requiring shorter time series than studies using only count data. They evaluated the benefits of the different types of data to the model results and the costs of collecting these, and concluded that collecting reproduction data instead of capture-recapture data was a more cost-effective strategy, especially for abundant species (Barraquand & Gimenez 2019).
Integrating similar data types collected in different ways, such as abundance data through camera traps and transects, or citizen science and scientific surveys is a useful strategy in single-species population modelling (Besbeas et al. 2002; Lee et al. 2015; Zipkin & Saunders 2018; Isaac et al. 2020). Recent studies indicate that the advantages inherent in data integration methods in single-species models are also present in multispecies frameworks (Péron & Koons 2012; Fithian et al. 2015; Lahoz-Monfort et al. 2017; Barraquand & Gimenez 2019; Miller et al. 2019). Data integration methods allow models to maximize the information extracted from each dataset, while considering the weaknesses and strengths of each one (Milleret al. 2019). Similarly, combining data on similar species can improve the estimates of each species individually (Lahoz-Monfortet al. 2017), and of data-poor species in particular (Fithianet al. 2015). For instance, using SDM, Fithian et al.(2015) found that when faced with presence-only data for a species, using presence-only and presence-absence data from other species facilitated information sharing across species, which improved parameterization for the data-poor species by leveraging information from closely connected species. Such data sharing is an additional advantage of multispecies models over single species ones in many systems (Kindsvater et al. 2018).

Model structure and simplifications

We have discussed methods that are helpful for estimating or inferring large numbers of species interactions for multispecies models, as well as ways through which ecological data are becoming more detailed and increasingly available. However, translating the high complexity of most natural systems to models often leads to increased uncertainty, and difficulty parameterizing and interpreting the results. Therefore, even the most comprehensive multispecies models require some simplifications. Dynamic multispecies population models have historically started as simplified versions of the dynamics of a small subset of species (e.g., Lotka-Volterra model), onto which complexity was added in the form of, for example, life stages, spatial dynamics, or environmental influence. In contrast, community or network models aim to describe or understand entire or large parts of an ecosystem and therefore need to simplify the description of these communities. They tend to do this by finding ways to reduce the number of interactions that need to be estimated separately in the model without reducing model performance (Morin 2011; Collie et al. 2016).
One way to reduce the number of interactions to be estimated is by reducing the number of nodes in the model, i.e., the number of community components (Fig. 1). Aggregating species into groups based on taxonomic, trophic or/and ecological similarity (i.e.,trophospecies or functional groups ), can help the development of simplified community models that cover a large proportion of the community and yet maintain key properties of more complex models (Hood et al. 2006; Ulanowicz et al. 2014; Olivier & Planque 2017). This strategy has the added benefit of helping to understand and make better predictions of rare and data-poor species because it allows one to “borrow” information from common, closely related species, or species with similar traits that are likely to respond similarly to the environment, thereby increasing the sample size used to estimate the parameters of the node. Similarly, sampling error and stochasticity can have a smaller negative effect on the model predictability when species are grouped (Agarwal et al. 2021). However, since this approach regards various species as equal, information about individual species and their dynamics is lost (Simmonset al. 2019). Thus, the model outputs become sensitive to the criteria used to classify species, which is largely dependent on the research question (Fath et al. 2007; Pacifici et al.2014). For example, a species classification based on taxonomical or ecological similarities might be better suited for addressing impacts of habitat change, while trophic similarities might be better suited for modelling harvesting impacts and energy flows. It is also important to assess the sensitivity of a model to different species classification criteria because differences in classification methods (e.g. cluster analyses, expert knowledge, model-based) can yield contrasting species groupings and model results (Picard et al. 2012; Olivier & Planque 2017).
Large communities can also be divided into subgroups or modulesbased on substructures within interaction networks (Olesen et al.2007; Dormann & Strauss 2014; Fig. 1). Modules represent recurring non-random groupings of species within the community that interact more with each other than with species from other modules (Olesen et al. 2007). Identifying modules within ecosystems is therefore a good strategy to find subsets of species that can be modelled independently from the rest of the community (Allesina et al. 2005). Different modules within a system can then be used as nodes of a coarser community model. As a result of fewer interactions, the task of modelling large communities becomes more manageable. Similarly, the number of interactions can in some cases be reduced by identifying species with weak interactions with the rest of the community and omitting them from the model. For example, rare species are sometimes assumed to exert such weak competition or predation pressure in relation to common or dominant species that their influence is ignored (Canard et al. 2012). However, extensive research of the system may be required before making such assumptions (Terry & Lewis 2020). Weak interactions can increase in importance over time (Terry et al. 2017) and, even if they remain weak, can still be important for maintaining the structure and stability of complex systems (Mccann et al. 1998). Removal of weak interactions may therefore not always result in realistic model predictions.
Instead of grouping or omitting species to reduce the number of nodes the same functions can sometimes be used to describe different interactions and processes (e.g. growth, foraging, dispersal, reproduction, competitiveness) while only varying their parameterization to best represent each species (McDermot & Rose 2000; Reuter 2005; Buchmann et al. 2011; Grimm & Berger 2016). For example, some forest models use the same function to describe competitive interactions (e.g. based on vertical leaf area distribution) and same growth function, but adjust the growth parameters to each species (Kohyama 1992). Similarly, some models of fish communities assume trophic interactions between fish to be size-dependent and species-independent, thus, the same predation function can be used across species (Giacominiet al. 2013; González-Varo & Traveset 2016). This type of simplification is often used in agent-based modelling, also known as individual-based modelling (IBM) among ecologists when the agents represent individuals. IBMs can assign the same biological “behaviour” (i.e., growth model, interaction model, dispersal model, etc.) to individuals from different species or groups of species and efficiently simulate the complex dynamics of some interacting species within a community (DeAngelis & Grimm 2014). It has also been argued that, analogous to how modellers tend to use just a few well-established functional responses, there could be a small subset of well-established functions to describe other types of species behaviours that influence interspecific dynamics, like foraging or home range, with well-understood properties and requirements, thereby facilitating model development and communication (Grimm & Berger 2016).
Latent variable approaches can also be a useful way to reduce the dimensionality of multispecies models. Latent variables are unobserved variables that can be used to represent the main axis of (co)variation among species (Warton et al. 2015). For instance, in JSDMs with latent variable structures, all pairs of species associations or co-occurrences are modelled jointly by searching for the leading axes of variation unaccounted for by the environmental effects. This creates linear combinations of several variables, limiting the dimensionality of the multispecies data (Thorson et al. 2015; Ovaskainen & Abrego 2020; van der Veen et al. 2021). Instructural equation modelling (SEM), latent variables typically represent a theorized environmental effect measured by one or more indicator variables (Grace et al. 2010). In Bayesian network analysis, latent variables are used to group nodes with similar roles in the network and can thereby reduce the complexity of the modelled system (Kim et al. 2018). Latent variables can also be used to estimate interaction probabilities where nodes with similar latent positions in the network structure are assumed to be more likely to interact (Rohret al. 2016; Kim et al. 2018).
While simplifications are a useful and necessary part of modelling, oversimplifications can lead to poorer model performance and loss of predictive power, especially under changing conditions (Raick et al. 2006; Berlow et al. 2009). Ideally, one would always compare the simplified models to more complete and complex models to assess their effectiveness and accuracy, as well as to identify the trade-offs of the simplifications (Raick et al. 2006). However, that would entail having abundant data to develop the complex models first, which is usually not an option. In practice, decisions on model structure and simplifications are often based on data availability instead of robust knowledge about ecological functionality (see e.g. Lafferty et al. 2008; Dunn et al. 2017). Because such decisions will continue to be necessary, especially in data-poor studies, it is important that the simplification methods, (e.g. criteria used to aggregate species or standardize links) are systematically documented to facilitate comparative studies that help highlight the strengths and weaknesses of each approach (Olivier & Planque 2017). We must not overlook the importance of having a robust understanding of the individual building blocks of natural ecosystems even if we aim to model whole systems (Koen-Alonso 2007). In the long run, this will promote the development of more encompassing and realistic models, while enabling us to limit their complexity and data requirements through ecologically grounded simplifications.

Dealing with uncertainty

Uncertainty is a feature of all statistical and mathematical models that result from simplifying natural processes using imperfect data to estimate unknown processes (Berlow et al. 2009). Epistemic or systematic uncertainty (Regan et al. 2002) enters the modelling process because of (1) errors in data or insufficient data, (2) random and non-random variation in nature, and (3) assumptions and simplifications about the parameters and model structure (Regan et al. 2002; Walkeret al. 2003; Koo et al. 2017; Fig. 2). While many of these sources of uncertainty are shared with single-species models, the greater complexity in multispecies models complicates the task of quantifying their effects, increases the number of pathways through which uncertainty can propagate and increases their potential influence on the overall model output (Zhang et al. 2015). It is therefore crucial that sources of uncertainty are identified, quantified and reported in any multispecies model.
Uncertainty in population-specific data can be accounted for through techniques derived from single-species frameworks, such as observation models that estimate sampling error. However, as multispecies models require more diverse data, accounting for measurement errors and/or systematic biases associated with all the data sources becomes more challenging (Regan et al. 2002). For instance, estimation of species interactions often requires sampling multiple sources (i.e., several species) and types of data (e.g., interaction types, frequencies) simultaneously, each with some degree of sampling error. Also, species interactions are expected to be influenced by environmental variables, which are themselves estimated with some degree of uncertainty (Koo et al. 2017). Similarly, because species interactions are sometimes estimated indirectly within the models (e.g., Ovaskainen et al. 2016), interaction estimates become model outputs and subject to additional uncertainty. Expanding datasets and improvements in measuring and identification techniques have great potential to reduce the degree of uncertainty in the data and make models less sensitive to prior assumptions (Cressie et al. 2009). However, obtaining more and better data is still limited by logistical challenges (Zhang et al. 2015). This is another reason why diversifying the types of independent data collected can be beneficial, as different model parameters can be informed by multiple data sources (e.g., fecundity, census, mortality-at-age) simultaneously in a single framework (Kindsvater et al. 2018). Identification and propagation of data uncertainty through the modelling process, and critical assessments of the conditions under which the models are useful are important to minimize errors, biases and misleading projections (Wells & O’Hara 2013; Certain et al. 2018; Engelhardt et al. 2020).
As mentioned in previous sections, multispecies models often have to rely on inferred interactions or researchers’ assumptions about the processes giving rise to the observed data (Milner-Gulland & Shea 2017). However, different assumptions lead to different results. This source of uncertainty can be particularly difficult to quantify because the resulting measurements of uncertainty associated with the likelihood function of the model do not inform about the correctness of the model, but about the certainty in the parameters, already assuming the model structure is true (Kinzey & Punt 2009). Instead, this structural uncertainty can be accounted for, quantified or reduced through model comparison, model averaging, or validation of predictions (Regan et al. 2002; Koo et al. 2017).
Uncertainty cannot be reduced to zero (Milner-Gulland & Shea 2017), but we can explore ways to minimize uncertainty and report it transparently. Recognizing and quantifying all sources of uncertainty is essential for evaluating model usefulness and identifying model weaknesses for future research (Zhang et al. 2015). Acknowledging uncertainty can also improve models directly. For example, including prior knowledge about ecological preferences in multispecies SDMs as uncertain, instead of fixed, has been shown to improve both predictability and accuracy (Vermeiren et al. 2020). However, recognition and analysis of uncertainty in multispecies models has received relatively little attention, a lack that has been argued to be a major hindrance for the use of multispecies models in management contexts (Thorpe et al.2015). Development of methods to consistently quantify and reduce uncertainty in multispecies models is therefore important going forward and should happen simultaneously with the development of the models themselves. This will ensure that we have the necessary toolkit to achieve usable model outputs and give biologically meaningful insights that can be used in management and conservation contexts. This will also help us to identify parts of the community or sampling designs associated with higher uncertainty, providing a simple way to improve data collection.