Pierre Gentine - Authorea

Pierre Gentine

Public Documents 12

Modeling wildfire activity in the western United States with machine learning

Jatan Buch

and 4 more

December 11, 2022

The annual area burned due to wildfires in the western United States (WUS) increased by more than 300% between 1984 and 2020. However, accounting for the nonlinear, spatially heterogeneous interactions between climate, vegetation, and human predictors driving the trends in fire frequency and sizes at different spatial scales remains a challenging problem for statistical fire models. Here we introduce a novel stochastic machine learning (ML) framework to model observed fire frequencies and sizes in 12 km x 12 km grid cells across the WUS. This framework is implemented using Mixture Density Networks trained on a wide suite of input predictors. The modeled WUS fire frequency corresponds well with observations at both monthly (r= 0.94) and annual (r= 0.85) timescales, as do the monthly (r= 0.90) and annual (r= 0.88) area burned. Moreover, the annual time series of both fire variables exhibit strong correlations (r >= 0.6) in 16 out of 18 ecoregions. Our ML model captures the interannual variability and the distinct multidecade increases in annual area burned for both forested and non-forested ecoregions. Evaluating predictor importance with Shapley additive explanations, we find that fire month vapor pressure deficit (VPD) is the dominant driver of fire frequencies and sizes across the WUS, followed by 1000-hour dead fuel moisture (FM1000), total monthly precipitation (Prec), mean daily maximum temperature (Tmax), and fraction of grassland cover in a grid cell. Our findings serve as a promising use case of ML techniques for wildfire prediction in particular and extreme event modeling more broadly.

Learning Atmospheric Boundary Layer Turbulence

Sara Shamekh

and 1 more

June 23, 2023

Accurately representing vertical turbulent fluxes in the planetary boundary layer is vital for moisture and energy transport. Nonetheless, the parameterization of the boundary layer remains a major source of inaccuracy in climate models. Recently, machine learning techniques have gained popularity for representing oceanic and atmospheric processes, yet their high dimensionality limits interpretability. This study introduces a new neural network architecture employing non-linear dimensionality reduction to predict vertical turbulent fluxes in a dry convective boundary layer. Our method utilizes turbulent kinetic energy and scalar profiles as input to extract a physically constrained two-dimensional latent space, providing the necessary yet minimal information for accurate flux prediction. We obtained data by coarse-graining Large Eddy Simulations covering a broad spectrum of boundary layer conditions, from weakly to strongly unstable. These regimes are employed to constrain the latent space disentanglement, enhancing interpretability. By applying this constraint, we decompose the vertical turbulent flux of various scalars into two main modes of variability: wind shear and convective transport. Our data-driven parameterization accurately predicts vertical turbulent fluxes (heat and passive scalars) across turbulent regimes, surpassing state-of-the-art schemes like the eddy-diffusivity mass flux scheme. By projecting each variability mode onto its associated scalar gradient, we estimate the diffusive flux and learn the eddy diffusivity. The diffusive flux is found to be significant only in the surface layer for both modes and becomes negligible in the mixed layer. The retrieved eddy diffusivity is considerably smaller than previous estimates used in conventional parameterizations, highlighting the predominant non-diffusive nature of transport.

Understanding and Improving Greenland Ice Albedo in Climate Models

Raf Antwerpen

and 5 more

April 15, 2024

A document by Raf Antwerpen. Click on the document to view its contents.

Data-Driven Equation Discovery of a Cloud Cover Parameterization

Arthur Grundner

and 3 more

April 18, 2023

A promising method for improving the representation of clouds in climate models, and hence climate projections, is to develop machine learning-based parameterizations using output from global storm-resolving models. While neural networks can achieve state-of-the-art performance, they are typically climate model-specific, require post-hoc tools for interpretation, and struggle to predict outside of their training distribution. To avoid these limitations, we combine symbolic regression, sequential feature selection, and physical constraints in a hierarchical modeling framework. This framework allows us to discover new equations diagnosing cloud cover from coarse-grained variables of global storm-resolving model simulations. These analytical equations are interpretable by construction and easily transferable to other grids or climate models. Our best equation balances performance and complexity, achieving a performance comparable to that of neural networks ($R^2=0.94$) while remaining simple (with only 13 trainable parameters). It reproduces cloud cover distributions more accurately than the Xu-Randall scheme across all cloud regimes (Hellinger distances $<0.09$), and matches neural networks in condensate-rich regimes. When applied and fine-tuned to the ERA5 reanalysis, the equation exhibits superior transferability to new data compared to all other optimal cloud cover schemes. Our findings demonstrate the effectiveness of symbolic regression in discovering interpretable, physically-consistent, and nonlinear equations to parameterize cloud cover.

Constraining respiration flux and carbon pools in a simple ecosystem carbon model

Olya Skulovich

and 3 more

February 04, 2024

Incorporating observational data in carbon-cycle models provides a systematic framework for understanding complex ecosystem carbon dynamics, contributing essential insights for climate change mitigation and land ability to continue acting as a carbon sink. This study addresses the challenge of accurately quantifying carbon fluxes and pools, focusing on the information content of remote sensing observations. The research explores the impact of assimilating multiple observational datasets into the CARbon DAta MOdel fraMework (CARDAMOM). Satellite observations such as solar-induced fluorescence (SIF) and vegetation optical depth (VOD) are used as proxies for photosynthesis and aboveground biomass, respectively. The study aims to answer key questions about the reliability of remote sensing data in constraining the ecosystem respiration flux and sizes and dynamics of carbon pools and the relative usefulness of SIF and VOD across five FLUXNET sites. We conclude that assimilating remote SIF and VOD instead of site-based net ecosystem exchange did not deteriorate and even improved model predictions for all metrics except for interannual variability. Notably, the improved results correspond to a consistent shift in values for crucial model parameters across all five investigated sites.

Hybrid Modeling of Evapotranspiration: Inferring Stomatal and Aerodynamic Resistances...

Reda ElGhawi

and 6 more

September 27, 2022

The process of evapotranspiration transfers water vapour from vegetation and soil surfaces to the atmosphere, the so-called latent heat flux (𝑄 LE), and thus crucially modulates Earth’s energy, water, and carbon cycles. Vegetation controls 𝑄 LE through regulating the leaf stomata (i.e., surface resistance 𝑟 s) and through altering surface roughness (aerodynamic resistance 𝑟 a). Estimating 𝑟 s and 𝑟 a across different vegetation types proves to be a key challenge in predicting 𝑄 LE. Here, we propose a hybrid modeling approach (i.e., combining mechanistic modeling and machine learning) for 𝑄 LE where neural networks independently learn the resistances from observations as intermediate variables. In our hybrid modeling setup, we make use of the Penman-Monteith equation based on the Big Leaf theory in conjunction with multi-year flux measurements across different forest and grassland sites from the FLUXNET database. We follow two conceptually different strategies to constrain the hybrid model to control for equifinality arising when estimating the two resistances simultaneously. One strategy is to impose an a priori constraint on 𝑟 a based on our mechanistic understanding (theory-driven strategy), while the other strategy makes use of more observational data and adds a constraint in predicting 𝑟 a through multi-task learning of the latent as well as the sensible heat flux (𝑄 H ; data-driven strategy). Our results show that all hybrid models exhibit a fairly high predictive skill for the target variables with 𝑅 2 = 0.82-0.89 for grasslands and 𝑅 2 = 0.70-0.80 for forests sites at the mean diurnal scale. The predictions of 𝑟 s and 𝑟 a show physical consistency across the two regularized hybrid models, but are physically implausible in the under-constrained hybrid model. The hybrid models are robust in reproducing consistent results for energy fluxes and resistances across different scales (diurnal, seasonal, interannual), reflecting their ability to learn the physical dependence of the target variables on the meteorological inputs. As a next step, we propose to test these heavily observation-informed parameterizations derived through hybrid modeling as a substitute for overly simple ad hoc formulations in Earth system models.

Weekly to annual variability of surface soil moisture

Xuan Xi

and 1 more

July 11, 2020

Soil moisture is important for sub-seasonal and seasonal climate prediction. However, biases and uncertainties of soil moisture in climate models affect the accuracy of climate prediction. Here we evaluate biases in climate model soil moisture across different time scales in the frequency domain. Based on our findings, compared to observations, soil moisture variability in the models is found to be underestimated at frequencies smaller than the seasonal time scale and overestimated at frequencies larger than the seasonal time scale. In addition, for the total effect of evapotranspiration and precipitation variability on soil moisture, models also underestimate frequencies smaller than the seasonal time scale and overestimate frequencies larger than it. Furthermore, no matter which factor (evapotranspiration or precipitation) is most affecting soil moisture, models underestimate its effect on soil moisture in the corresponding frequency range. Finally, at a global scale, biases in climate models can be related to the mean climate and not to soil properties. This study provides new insights into climate models deficiencies, and contributes to a better understanding of soil moisture and climate.

Implicit learning of convective organization explains precipitation stochasticity

Sara Shamekh

and 3 more

September 29, 2022

Accurate prediction of precipitation intensity is of crucial importance for both human and natural systems, especially in a warming climate more prone to extreme precipitation. Yet climate models fail to accurately predict precipitation intensity, particularly extremes. One missing piece of information in traditional climate model parameterizations is sub-grid scale cloud structure and organization, which affects precipitation intensity and stochasticity at the grid scale. Here we show, using storm-resolving climate simulations and machine learning, that by implicitly learning sub-grid organization, we can accurately predict precipitation variability and stochasticity with a low dimensional set of variables. Using a neural network to parameterize coarse-grained precipitation, we find mean precipitation is predictable from large scale quantities only; however, the neural network cannot predict the variability of precipitation (R 2 ∼ 0.4) and underestimates precipitation extremes. Performance is significantly improved when the network is informed by our novel organization metric, correctly predicting precipitation extremes and spatial variability (R 2 ∼ 0.95). The organization metric is implicitly learned by training the algorithm on high-resolution precipitable water, encoding organization degree and humidity amount at the subgrid-scale. The organization metric shows large hysteresis, emphasizing the role of memory created by sub-grid scale structures. We demonstrate this organization metric can be predicted as a simple memory process from information available at the previous time steps. These findings stress the role of organization and memory in accurate prediction of precipitation intensity and extremes and the necessity of parameterizing sub-grid scale convective organization in climate models to better project future changes in the water cycle and extremes.

Evaluating the effects of precipitation and evapotranspiration on soil moisture varia...

Xuan Xi

and 3 more

April 28, 2022

The effects of precipitation (Pr) and evapotranspiration (ET) on soil moisture play an essential role in the land-atmosphere system. Here we evaluate multimodel differences of these effects within the Coupled Model Intercomparison Project Phase 5 (CMIP5) compared to Soil Moisture Active Passive (SMAP) products in the frequency domain. The variability of surface soil moisture (SSM), Pr, and ET within three frequency bands (7 ~ 30 days, 30 ~ 90 days, and 90 ~ 365 days) after normalization is quantified using Fourier transform. We then analyze the impact of ET and Pr on SSM variability based on a transfer function assuming these variables with a linear time-invariant (LTI) system. For the simulated effects of ET and Pr on SSM variability, models underestimate them in the two higher frequency bands and overestimate them in the lowest frequency band but show better estimates in transitional zones between dry and wet climates. Besides, the effects on SSM by Pr and ET are found to be different across the three frequency bands, and models underestimate the one of Pr and ET as the dominant factor controlling SSM variability in each frequency band. This study identifies the spatiotemporal distribution of the CMIP5 model deficiencies in simulating ET and Pr effects on SSM. Overcoming these deficiencies could improve the interpretability and predictability of Earth system models in simulating interactions among the three variables.

Deep Learning Based Cloud Cover Parameterization for ICON

Arthur Grundner

and 5 more

December 22, 2021

A promising approach to improve cloud parameterizations within climate models and thus climate projections is to use deep learning in combination with training data from storm-resolving model (SRM) simulations. The Icosahedral Non-Hydrostatic (ICON) modeling framework permits simulations ranging from numerical weather prediction to climate projections, making it an ideal target to develop neural network (NN) based parameterizations for sub-grid scale processes. Within the ICON framework, we train NN based cloud cover parameterizations with coarse-grained data based on realistic regional and global ICON SRM simulations. We set up three different types of NNs that differ in the degree of vertical locality they assume for diagnosing cloud cover from coarse-grained atmospheric state variables. The NNs accurately estimate sub-grid scale cloud cover from coarse-grained data that has similar geographical characteristics as their training data. Additionally, globally trained NNs can reproduce sub-grid scale cloud cover of the regional SRM simulation. Using the game-theory based interpretability library SHapley Additive exPlanations, we identify an overemphasis on specific humidity and cloud ice as the reason why our column-based NN cannot perfectly generalize from the global to the regional coarse-grained SRM data. The interpretability tool also helps visualize similarities and differences in feature importance between regionally and globally trained column-based NNs, and reveals a local relationship between their cloud cover predictions and the thermodynamic environment. Our results show the potential of deep learning to derive accurate yet interpretable cloud cover parameterizations from global SRMs, and suggest that neighborhood-based models may be a good compromise between accuracy and generalizability.

Machine Learning for Clouds and Climate (Invited Chapter for the AGU Geophysical Mono...

Tom Beucler

and 4 more

April 29, 2021

Key Points: • Machine learning (ML) helps model the interaction between clouds and climate using large datasets. • We review physics-guided/explainable ML applied to cloud-related processes in the climate system. • We also provide a guide to scientists who would like to get started with ML. Abstract: Machine learning (ML) algorithms are powerful tools to build models of clouds and climate that are more faithful to the rapidly-increasing volumes of Earth system data than commonly-used semiempirical models. Here, we review ML tools, including interpretable and physics-guided ML, and outline how they can be applied to cloud-related processes in the climate system, including radiation, microphysics, convection, and cloud detection , classification, emulation, and uncertainty quantification. We additionally provide a short guide to get started with ML and survey the frontiers of ML for clouds and climate .

Shallow groundwater inhibits soil respiration and favors carbon uptake in a wet alpin...

Shaobo Sun

and 7 more

May 06, 2020

Wet alpine meadow ecosystems generally act as a significant carbon sink due to their higher rate of photosynthesis than the rate of decomposition. However, it remains unclear whether the low decomposition rate is determined by low temperatures or by nearly-saturated soil conditions. Using five years of measurements from two sites on the Tibetan Plateau with significantly different soil water conditions, we showed that compared to the dry site (which had a deep water table), the much larger carbon sink at the site with a shallow groundwater was mainly caused by the inhibiting effects of the nearly-saturated soil condition on soil respiration rather than by the low temperature. The findings suggested that thawing of frozen soil may partially slow down soil carbon decomposition through increasing soil water. We highlights that a warming-induced shrinking cryosphere may largely affect the carbon dynamics of wet and cold ecosystems through changes in soil hydrology.