2. Forest modelling challenges and solutions

Forest model development and predictive ability have been constrained by different factors, mainly resulting from (i) data availability, (ii) technical challenges, especially the availability of computing resources, and (iii) an incomplete process understanding. While advancements were made on all these points, some challenges are at the same time further amplified as models seek to adopt a finer-scale representation of processes, vegetation structure and diversity, while sustaining or expanding the spatio-temporal scale of simulations. Here we briefly present these main types of obstacles encountered in forest modelling and approaches that are being developed to overcomethem .

Data availability

Forest models are data-demanding across the different steps of model development and application. A robust parameterization of the multiple processes related to plant life cycle and physiology for diverse plant types, species or individuals requires various data across scales, from plant organ to population, including environmental factors. For many processes, such data are often not available in the required quality and resolution, e.g. for tolerance of trees to resource limitations (McMahon et al. 2011; Craine et al. 2012; De Kauwe et al. 2015) or soil characteristics (Marthews et al. 2014). Additionally, a thorough initialization and validation of forest simulations over large spatial and temporal scales requires observation data encompassing both fine resolution and large coverage over long time spans, which can still be a challenge (Estes et al. 2018, Table 1).
Fortunately, data availability is increasing at a high pace. Global plant trait databases (e.g. TRY, Kattge et al. 2011; Table 1) gather data of commonly measured traits (e.g. leaf mass per area or wood density) for a wide range of species, and this effort is being expanded to other traits (e.g. stem and leaf drought tolerance, Bartlett et al. 2012, Choat et al. 2012; fine root traits, Iversen et al. 2017; litter decomposition rates, Brovkin et al. 2012). This fosters a systematic model trait-based parameterization for a range of plant species and individuals. For example, Scheiter et al. (2013) and Sakschewski et al. (2015) used reported trait coordination to constrain individual trait combinations in simulations of forest dynamics with DGVMs. In doing so, they improved model representation of functional diversity from a few discrete plant functional types to a continuum of traits, while excluding unrealistic trait combinations (Van Bodegom et al. 2012). Similarly, by taking advantage of comprehensive trait databases, but also of long term inventories and of the detailed information they provide on tree life-histories, forest IBMs have been allowed to simulate hundreds of species within diverse forest communities (Maréchaux & Chave 2017; Rüger et al. 2019).
Simultaneously, networks of forest plot inventories are being complemented by remote-sensing data (Table 1), offering novel opportunities to initialize and/or validate model simulation over large spatial scales (Shugart et al. 2015) or complement predictors of SDMs (Fedrigo et al. 2019). Recent advances in remote sensing tools, such as the possibility to derive tree-level information within dense canopies (Ferraz et al. 2016) or fuse spectrometer data with co-registered LiDAR data (Jucker et al. 2018), provide new ways to parameterize models (e.g. allometries, Jucker et al. 2017; Fischer et al. 2019). Citizen science programs have also been developed to create new opportunities of forest data sampling over large areas (Delbart et al. 2015; Giraud et al. 2016; Affouard et al. 2017; Wäldchen et al. 2018).
The development of machine learning techniques offers new possibilities to use the resulting huge datasets for model development and evaluation (Botella et al. 2018; Forkel et al. 2019; Reichstein et al. 2019). Rammer & Seidl (2019), for instance, have used deep neural networks to estimate vegetation transitions across large spatial scales. Additionally, bayesian and/or inverse modelling approaches can be used to take advantage of diverse sources of data to estimate process parameters and calibrate entire models (van Oijen et al. 2005; Hartig et al. 2011, 2014; LeBauer et al. 2013; Dietze et al. 2014; Lehmann & Huth 2015; Fischer et al. 2019). For example, van Oijen et al. (2013) found a strong reduction of uncertainty in most forest models after a Bayesian calibration.
Table 1

Technical challenges

Several technical obstacles constrain model developments and runtime. First, computing power – in terms of speed and memory – imposes a trade-off between simulation resolution and coverage, still today limiting large-scale applications or the fitting of fine-grained models. For example, the finer-grained representation of forest biodiversity and structure recently implemented in a DGVM model (LPJmL-FIT, Sakschewski et al. 2015) was restricted to one biome (Tropics of South America) as opposed to the global scale typically reached by classic DGVM simulations. However, computing power will probably continue to increase in the next years (Kurzweil 2005), which, together with parallel processing, model upscaling and improved algorithms, allows continuous reduction of computing time (von Bloh et al. 2010; Snell 2014). As an illustration, using Fast Fourier Transformations for seed dispersal instead of modelling dispersal from each cell to each other increased the computing speed by a factor of 100 (Lehsten et al. 2019). Additionally, remote-sensing observations allow the up-scaling of individual-based forest models at lower costs (Shugart et al. 2015). For example, by using remote-sensing-derived measurements of forest height across a gridded map over the Amazonian basin and a locally optimized gap model, it was possible to estimate the forest successional stages of every cell in this area and derive maps of aboveground biomass and productivity of the whole basin (Rödig et al. 2017, 2018). However, a fundamental change of an algorithm in complex models can invoke unplanned side-effects, sometimes forcing modelers to invest substantial time and effort to stabilize the new model versions.
Second, expanding model development and applications relies on code and data sharing within and among larger communities of model developers and users, which is also accompanied by technical challenges. Several modeller teams make the model code (partly) freely available. Additionally, version control systems allow to track changes and collaborate on model code in an efficient way (e.g. Git, Ram 2013; e.g. Collalti et al. 2016). Besides code sharing, simulation data are increasingly available following data open access requirements, allowing subsequent analyses or model comparisons (Box 1). In many modelling studies, the preparation of data (e.g. for input/initialization, calibration or validation) and the analyses of model outputs are very work- and time-intensive. Sharing scripts for analysing forest simulations, e.g. through dedicated platforms (e.g. LeBauer et al. 2013) or R (R Core Team 2018) package (e.g. Duursma et al. 2012), is also of great help. Furthermore, the development of visualisation tools to illustrate simulation results in virtual forest scenes (e.g. Dufour-Kowalski et al. 2012; Fig. 1) represents a valuable lever to communicate on model structure, functioning and outputs, to inspire for new model developments and applications, but also to detect model errors.
Figure 1
Box 1

Process understanding

Another challenge in forest modelling results from the imperfect knowledge of processes that shape forest dynamics, e.g. regeneration (Vacchiano et al. 2018), mortality (Hartmann et al. 2018b), carbon allocation (De Kauwe et al. 2014; Hartmann et al. 2018a), photosynthesis, autotrophic respiration as well as leaf conductance (Rogers et al. 2017; Collalti & Prentice 2019). Due to the lack of consensus on the mechanisms underlying these basic processes, their representations differ substantially across models (e.g. response to increased temperature, Galbraith et al. 2010; response to water stress, Powell et al. 2013, Restrepo-Coupe et al. 2017; tree mortality, Johnson et al. 2016, Hülsmann et al. 2018). As an illustration, one of the first fully coupled simulations between a Global Circulation Model and a DGVM (Box 2) predicted a critical transition of the Amazonian rainforest towards a much drier savannah-type ecosystem under continuing deforestation and increased atmospheric CO2 concentration (Cox et al. 2004). An updated model version projected much smaller changes of the Amazonian forest extent for the 21st century (Good et al. 2013). These differences partly resulted from our improved understanding of respiration acclimation to high temperatures (Smith & Dukes 2013; Huntingford et al. 2017). Similarly, a better inclusion of nitrogen limitation in a DGVM reduced the simulated CO2 fertilization effect in agreement with observations (Smith et al. 2014). Therefore the lack of one or some critical processes in a model can potentially lead to diverging projections.
Knowledge gaps often result from a limited availability of suitable data that are costly and/or time-consuming to collect. As trees are typically long-lived, experiments and field monitoring should extend over multiple decades to capture long-term trends, which is a temporal coverage still out of reach of most empirical studies and prevents their repeatability (Schnitzer & Carson 2016). While fundamentally relying on the basic knowledge developed through empirical studies, models themselves represent key tools to investigate unresolved questions through the generation of virtual data. For example, using a gap model, Bohn & Huth (2017) created a database of 500,000 virtual forest plots varying in forest composition and structure, allowing to explore the drivers of temperature sensitivity of productivity in temperate forests.
Additionally, models can be used to test hypotheses about processes (Maris et al. 2018) by applying a range of scenarios or comparing different ways to model processes, e.g. between model versions or different models (e.g. Fisher et al. 2006; Sakschewski et al. 2016; Langan et al. 2017; Lovenduski & Bonan 2017; Collalti et al. 2019a; Box 1). For example, using 15 models, including DGVMs and forest gap models, Bugmann et al. (2019) explored the influence of different simulated mortality processes on forest dynamics, providing insights into the effects of process uncertainties. Similarly, but within the same model, Collalti et al. (2019b) tested two ecological theories about plant respiration. Models can thus prove useful to pinpoint data and knowledge gaps and hence further guide the empirical development of knowledge (Rykiel Jr. 1996; Van Nes & Scheffer 2005; Medlyn et al. 2016; Norby et al. 2016).

Converging trajectories of model developments

As illustrated above, the different forest modelling approaches were initially motivated by different specific objectives, leading to different choices and compromises in the representation of actual vegetation. DGVMs originally focused on biogeochemical processes as the exchange of carbon and water between vegetation and atmosphere at the global scale, but this was at the cost of a realistic representation of forest diversity, competition, and structure. Conversely, SDMs adopted a species-level representation of vegetation diversity, but have long relied on a correlative-only approach, bypassing the mechanistic processes underlying species distribution. Similarly, IBMs typically used a finer-grained representation of vegetation structure than DGVMs, as they simulate many individuals, focusing on the competition among species, but often at the cost of an aggregated representation of some processes such as leaf gas exchanges or water flow.
The multiple scientific and technical advances described above have allowed to overcome the constraints that modellers initially faced. As a result, each of these model types has been gaining in efficiency and capabilities. Next-generation DGVMs strive to explicitly represent tree demography and diversity within PFTs, and forest structure, IBMs refine their representation of biogeochemical cycles, while SDMs endeavour to include process-based information. In doing so, their trajectories of development have been progressively converging. As a result, each model type has broadened its field of applications beyond its initial scope, encouraging the synergies among models, including their coupling (Box 2), to address key ecological research questions in a mutually-informative way..
Box 2