2. Forest modelling challenges and solutions
Forest model development and predictive ability have been constrained by
different factors, mainly resulting from (i) data availability, (ii)
technical challenges, especially the availability of computing
resources, and (iii) an incomplete process understanding. While
advancements were made on all these points, some challenges are at the
same time further amplified as models seek to adopt a finer-scale
representation of processes, vegetation structure and diversity, while
sustaining or expanding the spatio-temporal scale of simulations. Here
we briefly present these main types of obstacles encountered in forest
modelling and approaches that are being developed to overcomethem .
Data availability
Forest models are data-demanding across the different steps of model
development and application. A robust parameterization of the multiple
processes related to plant life cycle and physiology for diverse plant
types, species or individuals requires various data across scales, from
plant organ to population, including environmental factors. For many
processes, such data are often not available in the required quality and
resolution, e.g. for tolerance of trees to resource limitations (McMahon
et al. 2011; Craine et al. 2012; De Kauwe et al. 2015) or soil
characteristics (Marthews et al. 2014). Additionally, a thorough
initialization and validation of forest simulations over large spatial
and temporal scales requires observation data encompassing both fine
resolution and large coverage over long time spans, which can still be a
challenge (Estes et al. 2018, Table 1).
Fortunately, data availability is increasing at a high pace. Global
plant trait databases (e.g. TRY, Kattge et al. 2011; Table 1) gather
data of commonly measured traits (e.g. leaf mass per area or wood
density) for a wide range of species, and this effort is being expanded
to other traits (e.g. stem and leaf drought tolerance, Bartlett et al.
2012, Choat et al. 2012; fine root traits, Iversen et al. 2017; litter
decomposition rates, Brovkin et al. 2012). This fosters a systematic
model trait-based parameterization for a range of plant species and
individuals. For example, Scheiter et al. (2013) and Sakschewski et al.
(2015) used reported trait coordination to constrain individual trait
combinations in simulations of forest dynamics with DGVMs. In doing so,
they improved model representation of functional diversity from a few
discrete plant functional types to a continuum of traits, while
excluding unrealistic trait combinations (Van Bodegom et al. 2012).
Similarly, by taking advantage of comprehensive trait databases, but
also of long term inventories and of the detailed information they
provide on tree life-histories, forest IBMs have been allowed to
simulate hundreds of species within diverse forest communities
(Maréchaux & Chave 2017; Rüger et al. 2019).
Simultaneously, networks of forest plot inventories are being
complemented by remote-sensing data (Table 1), offering novel
opportunities to initialize and/or validate model simulation over large
spatial scales (Shugart et al. 2015) or complement predictors of SDMs
(Fedrigo et al. 2019). Recent advances in remote sensing tools, such as
the possibility to derive tree-level information within dense canopies
(Ferraz et al. 2016) or fuse spectrometer data with co-registered LiDAR
data (Jucker et al. 2018), provide new ways to parameterize models (e.g.
allometries, Jucker et al. 2017; Fischer et al. 2019). Citizen science
programs have also been developed to create new opportunities of forest
data sampling over large areas (Delbart et al. 2015; Giraud et al. 2016;
Affouard et al. 2017; Wäldchen et al. 2018).
The development of machine learning techniques offers new possibilities
to use the resulting huge datasets for model development and evaluation
(Botella et al. 2018; Forkel et al. 2019; Reichstein et al. 2019).
Rammer & Seidl (2019), for instance, have used deep neural networks to
estimate vegetation transitions across large spatial scales.
Additionally, bayesian and/or inverse modelling approaches can be used
to take advantage of diverse sources of data to estimate process
parameters and calibrate entire models (van Oijen et al. 2005; Hartig et
al. 2011, 2014; LeBauer et al. 2013; Dietze et al. 2014; Lehmann & Huth
2015; Fischer et al. 2019). For example, van Oijen et al. (2013) found a
strong reduction of uncertainty in most forest models after a Bayesian
calibration.
Table 1
Technical challenges
Several technical obstacles constrain model developments and runtime.
First, computing power – in terms of speed and memory – imposes a
trade-off between simulation resolution and coverage, still today
limiting large-scale applications or the fitting of fine-grained models.
For example, the finer-grained representation of forest biodiversity and
structure recently implemented in a DGVM model (LPJmL-FIT, Sakschewski
et al. 2015) was restricted to one biome (Tropics of South America) as
opposed to the global scale typically reached by classic DGVM
simulations. However, computing power will probably continue to increase
in the next years (Kurzweil 2005), which, together with parallel
processing, model upscaling and improved algorithms, allows continuous
reduction of computing time (von Bloh et al. 2010; Snell 2014). As an
illustration, using Fast Fourier Transformations for seed dispersal
instead of modelling dispersal from each cell to each other increased
the computing speed by a factor of 100 (Lehsten et al. 2019).
Additionally, remote-sensing observations allow the up-scaling of
individual-based forest models at lower costs (Shugart et al. 2015). For
example, by using remote-sensing-derived measurements of forest height
across a gridded map over the Amazonian basin and a locally optimized
gap model, it was possible to estimate the forest successional stages of
every cell in this area and derive maps of aboveground biomass and
productivity of the whole basin (Rödig et al. 2017, 2018). However, a
fundamental change of an algorithm in complex models can invoke
unplanned side-effects, sometimes forcing modelers to invest substantial
time and effort to stabilize the new model versions.
Second, expanding model development and applications relies on code and
data sharing within and among larger communities of model developers and
users, which is also accompanied by technical challenges. Several
modeller teams make the model code (partly) freely available.
Additionally, version control systems allow to track changes and
collaborate on model code in an efficient way (e.g. Git, Ram 2013; e.g.
Collalti et al. 2016). Besides code sharing, simulation data are
increasingly available following data open access requirements, allowing
subsequent analyses or model comparisons (Box 1). In many modelling
studies, the preparation of data (e.g. for input/initialization,
calibration or validation) and the analyses of model outputs are very
work- and time-intensive. Sharing scripts for analysing forest
simulations, e.g. through dedicated platforms (e.g. LeBauer et al. 2013)
or R (R Core Team 2018) package (e.g. Duursma et al. 2012), is also of
great help. Furthermore, the development of visualisation tools to
illustrate simulation results in virtual forest scenes (e.g.
Dufour-Kowalski et al. 2012; Fig. 1) represents a valuable lever to
communicate on model structure, functioning and outputs, to inspire for
new model developments and applications, but also to detect model
errors.
Figure 1
Box 1
Process understanding
Another challenge in forest modelling results from the imperfect
knowledge of processes that shape forest dynamics, e.g. regeneration
(Vacchiano et al. 2018), mortality (Hartmann et al. 2018b), carbon
allocation (De Kauwe et al. 2014; Hartmann et al. 2018a),
photosynthesis, autotrophic respiration as well as leaf conductance
(Rogers et al. 2017; Collalti & Prentice 2019). Due to the lack of
consensus on the mechanisms underlying these basic processes, their
representations differ substantially across models (e.g. response to
increased temperature, Galbraith et al. 2010; response to water stress,
Powell et al. 2013, Restrepo-Coupe et al. 2017; tree mortality, Johnson
et al. 2016, Hülsmann et al. 2018). As an illustration, one of the first
fully coupled simulations between a Global Circulation Model and a DGVM
(Box 2) predicted a critical transition of the Amazonian rainforest
towards a much drier savannah-type ecosystem under continuing
deforestation and increased atmospheric CO2
concentration (Cox et al. 2004). An updated model version projected much
smaller changes of the Amazonian forest extent for the 21st century
(Good et al. 2013). These differences partly resulted from our improved
understanding of respiration acclimation to high temperatures (Smith &
Dukes 2013; Huntingford et al. 2017). Similarly, a better inclusion of
nitrogen limitation in a DGVM reduced the simulated CO2
fertilization effect in agreement with observations (Smith et al. 2014).
Therefore the lack of one or some critical processes in a model can
potentially lead to diverging projections.
Knowledge gaps often result from a limited availability of suitable data
that are costly and/or time-consuming to collect. As trees are typically
long-lived, experiments and field monitoring should extend over multiple
decades to capture long-term trends, which is a temporal coverage still
out of reach of most empirical studies and prevents their repeatability
(Schnitzer & Carson 2016). While fundamentally relying on the basic
knowledge developed through empirical studies, models themselves
represent key tools to investigate unresolved questions through the
generation of virtual data. For example, using a gap model, Bohn & Huth
(2017) created a database of 500,000 virtual forest plots varying in
forest composition and structure, allowing to explore the drivers of
temperature sensitivity of productivity in temperate forests.
Additionally, models can be used to test hypotheses about processes
(Maris et al. 2018) by applying a range of scenarios or comparing
different ways to model processes, e.g. between model versions or
different models (e.g. Fisher et al. 2006; Sakschewski et al. 2016;
Langan et al. 2017; Lovenduski & Bonan 2017; Collalti et al. 2019a; Box
1). For example, using 15 models, including DGVMs and forest gap models,
Bugmann et al. (2019) explored the influence of different simulated
mortality processes on forest dynamics, providing insights into the
effects of process uncertainties. Similarly, but within the same model,
Collalti et al. (2019b) tested two ecological theories about plant
respiration. Models can thus prove useful to pinpoint data and knowledge
gaps and hence further guide the empirical development of knowledge
(Rykiel Jr. 1996; Van Nes & Scheffer 2005; Medlyn et al. 2016; Norby et
al. 2016).
Converging trajectories of model
developments
As illustrated above, the different forest modelling approaches were
initially motivated by different specific objectives, leading to
different choices and compromises in the representation of actual
vegetation. DGVMs originally focused on biogeochemical processes as the
exchange of carbon and water between vegetation and atmosphere at the
global scale, but this was at the cost of a realistic representation of
forest diversity, competition, and structure. Conversely, SDMs adopted a
species-level representation of vegetation diversity, but have long
relied on a correlative-only approach, bypassing the mechanistic
processes underlying species distribution. Similarly, IBMs typically
used a finer-grained representation of vegetation structure than DGVMs,
as they simulate many individuals, focusing on the competition among
species, but often at the cost of an aggregated representation of some
processes such as leaf gas exchanges or water flow.
The multiple scientific and technical advances described above have
allowed to overcome the constraints that modellers initially faced. As a
result, each of these model types has been gaining in efficiency and
capabilities. Next-generation DGVMs strive to explicitly represent tree
demography and diversity within PFTs, and forest structure, IBMs refine
their representation of biogeochemical cycles, while SDMs endeavour to
include process-based information. In doing so, their trajectories of
development have been progressively converging. As a result, each model
type has broadened its field of applications beyond its initial scope,
encouraging the synergies among models, including their coupling (Box
2), to address key ecological research questions in a
mutually-informative way..
Box 2