Material and methods

We used ground-based fuel hazard score observations and gridded information on 15 environmental predictors related to climate (6), topography (5), soil (3) and potential vegetation cover (1). We used the ground-based fuel data to train and evaluate random forest models. We then made projections of future fuel hazard using climate model projections. The details of the data and models are outlined below.

Fuel hazard observations

The study area was the state of Victoria in Southeast Australia, where extensive fine resolution data were available to test our novel machine-learning approach (Figure 1). The Victorian Department of Environment, Land, Water and Planning (DELWP) provided field observations of fuel hazard ratings over the period 1995 to 2017 with a total of 47,245 individual records. The data contain categorical fuel hazard scores for surface, near-surface, elevated and bark fuel strata, with five levels: low (1), moderate (2), high (3), very high (4) and extreme (5) (Hines et al. 2010). The records were mostly one-off assessments for georeferenced plots of ca. 20m x 20 m.
We focused the analysis on the elevated stratum because of its high consistency among different surveys (Watson et al., 2012) and high impact on fire regimes (Hines et al. 2010). The elevated fuel hazard score is based on the cover and horizontal connectivity of dead and live plants that is close for biomass that may not be consumed by a flame height of 0.5 m. The near-surface fuel hazard score is based on cover and horizontal connectivity of dead and live plants that is close (<0.5m) but not lying on the ground. The surface fuel hazard score is based on litter depth, cover and horizontal connectivity lying on the ground. Bark fuel hazard is specific to certain tree species and alone does not contribute to the fire regime at landscape scale. Therefore, we do not specifically model bark fuel stratum in this study.
Since we are interested in predicting potential fuel hazard score, we used the observations made at least 10 years after the last fire; this length of time allows the fuel to accumulate beyond 95% of capacity (Peet et al., 1971; Fox et al., 1979; Burrows et al, 1994; Zazali et al., 2021). This filtering resulted in a total of 27,799 observations covering Victoria. Figure 1 shows the spatial distribution of the field observations, with the temporal distribution shown in Figure S1.

Climate predictors

We chose a set of climate, soil and topographic predictors that have been shown to drive fuel hazard in previous studies (e.g., Jenkins et al., 2020; McColl‐Gausden et al., 2020). We used gridded daily climate data at 0.05° x 0.05° resolution from 1994-2018 obtained from the SILO project (Jeffrey et al., 2001; accessed at http://www.longpaddock.qld.gov.au/silo/). We extracted daily precipitation (PPT), maximum air temperature (Tmax) and minimum relative humidity (RH) for the grid cells corresponding to the locations of the field observations. We aggregated the climate data to monthly time steps by taking the sum of precipitation and means for other variables. The PPT, Tmax, minimum RH of the month before the fuel hazard observations were used as predictors to represent the impacts of short-term climate conditions. We calculated the mean annual precipitation (MAP) and Tmax for 1994-2018 to represent long-term climate conditions. Finally, a rainfall seasonality index presented by Feng et al. (2013) was used to capture long-term variation in the temporal distribution of precipitation over the twelve months of the year; the index varies from 0 (even distribution over all months) to 2.48 (all rainfall concentrated in a single month). The details of meteorological data are in Table S1. In addition to the climatic predictors used in the model, we obtained potential evapotranspiration (PET) from SILO and averaged over 1994-2018. We then calculated an Aridity index (AI) as PET/MAP. AI is not used as a predictor in the models but rather as an indicator of long-term water balance among the pixels in the following analysis.

Soil and terrain attributes

Soil attributes, in particular proxies for water holding capacity and soil fertility, are expected to constrain spatial variation of fuel loads and fuel properties via their effect on vegetation composition, density, and structure (McColl‐Gausden et al, 2020). We obtained gridded bulk density, clay content for topsoil (0-5 cm) and Available Volumetric Water Capacity (AWC; 0-200 cm) at 90 m resolution from the Whole Earth product of the Soil and Landscape Grid of Australia (Malone, 2022).
We used fine scale topographic products containing wetness index (Gallant and Austin, 2012a), adjusted monthly solar radiation in January and July (Gallant et al., 2014), and plan/profile curvature (Gallant and Austin, 2012 b,c) at 3 arcsecond resolution (~90 m). The topographic wetness index, calculated as specific catchment area divided by slope, is commonly used as an indicator of soil water availability (Gallant and Austin, 2012a). Mean monthly shortwave radiation is the mean shortwave radiation (MJ m-2d-1) received by a surface accounting for latitude, day of year, average atmospheric conditions, and terrain effects (i.e., slope, aspect and topographic shading). We chose the shortwave radiation in January and July to account for different energy inputs for summer and winter. Plan and profile curvature, derived from the Smoothed Digital Elevation Model (Geoscience Australia, 2015), added further constraints on soil moisture availability and other variation in other soil attributes (e.g., soil depth). The soil and topographic data used are summarised in Table S1.

Future climate change projections

The climate change projections are based on the downscaled output of nine General Circulation Models (Clark et al., 2021): ACCESS1-0, BNU-ESM, CSIRO-Mk3-6-0, GFDL-CM3, GFDL-ESM2G, GFDL-ESM2M, INM-CM4, IPSL-CM5A-LR, MRI-CGCM3. The chosen projections were from a full list of 12 models in Clark et al. (2021), but we excluded three models because they do not cover both Representative Concentration Pathways (RCP) 4.5 and 8.5. We used projections of PPT, Tmax and minimum RH for the intermediate RCP 4.5 and high emission RCP 8.5 for the period 2000-2100. We obtained projected climate data for time periods of 16 years at the beginning (2000-2015), middle (2045-2060), and end (2085-2100) of the 21st century. We calculated the mean monthly climate for those three time periods for each of the nine climate models. The data were then aggregated in the same way as the historical climate data while the spatial resolution was kept at 0.036 degree (~3.6 km). The MAP and mean Tmax were the mean of each 16-year period. The mean of current and future climate among the projections are shown in Figure S2. The future Ca used in the study is shown in Table S2.

Optimal leaf area index

Vegetation structural response to climate change was based on gridded LAI layers simulated by an optimisation model (Yang et al., 2018). Briefly, the model uses long-term mean precipitation, maximum air temperature, vapour pressure deficit (calculated based on Tmax and RH) and photosynthetically active radiation (converted from monthly solar radiation) to predict LAI based on the concept of ecohydrological equilibrium (Woodward,1987). The long-term mean PPT is used to estimate the amount of water available to support evapotranspiration. The optimal LAI is then calculated as the LAI that maximises canopy carbon export (gross photosynthesis less leaf construction and respiration costs) subject to this constraint on evapotranspiration.
There are four advantages of the optimal LAI model (Yang et al., 2018) which makes it suitable to incorporate into machine learning: (i) it has detailed photosynthetic and stomatal processes to capture plant responses to climate change and rising Ca; (ii) the optimisation process accounts for potential changes in plant strategies under future conditions; (iii) it showed a good agreement to satellite-derived and ground-based observations over Australia during 2000-2011; (iv) it is parsimonious with minimum computational requirements and high interpretability. The optimal LAI model therefore allows us to capture essential aspects of the ‘fertilisation effect’ on future fuel hazard. Current and future LAI are shown in Figure S2.

Random forest models

We constructed three random forest models: one for each fuel stratum (elevated, near-surface and surface). Each model predicted a fuel hazard score of a stratum using 15 predictors including climate, soil, topography, and LAI as shown in Table S1. The field-based fuel hazard score data set was split into a training subset (random sample of 70% of observations) and an evaluation data subset (remaining 30% of observations). Due to the importance of elevated fuels for variation in fire intensity and rate of spread, we focused our evaluation and prediction on this stratum (Cheney et al., 2012). All of the data process and analyses were conducted in R (4.1, R Core Team), with the ‘randomforest’ function from the ‘randomforest’ package used for model fitting.

Model performance and importance of inputs

We used observed accuracy and Fleiss’s Kappa coefficient (Fleiss, 1971) to assess model performance. Briefly, observed accuracy captures the agreement between model predictions and observations. However, due to the imbalance in each score, accuracy could be driven by a single score that has high frequency in the data. The Kappa coefficient addresses this issue by balancing the frequency of score in the data (expected accuracy) with the observed accuracy. A Kappa coefficient over 0.4 is generally considered as good (McHugh, 2012). We also used ‘no-information rate’ (i.e., accuracy achievable by random sampling) as a baseline of model performance. The more model accuracy exceeds the ‘no information accuracy rate’ the better the model performance.
To assess the importance of each predictor, we used the Gini index (also known as “Gini importance” or “mean decrease impurity”) defined as the total decrease in node impurity (the proportion of a sample in each node) averaged over all trees of the ensemble (Breiman, 2001). Since the Gini index is in favour of variables with high categorical frequency, we also reported the percent change of mean square error of each variable.

Future projections

We explored the applicability of the field data to the whole region (Note S1) and found that the sample sites covered most environmental variation of the region (Figure S3). We thus predicted future fuel hazard scores for each of the nine models and two RCPs, a total of 18 climate projections. The soil and topography layers were resampled to the resolution of the climate model outputs (3.6 km) to make the projection at state scale. A LAI layer was predicted for each climate projection and period. Note that we also predicted fuel hazard scores for 2000-2015 with climate projections to avoid potential discrepancies between SILO and climate models. The projected fuel hazard scores varied by month because of monthly climate predictors. Since past intense fires in this region occurred mostly around January, we only report the fuel hazard score for January. The future fuel projections targeted potential fuel with the assumption of the land covered by natural vegetation. The projections contained the probability of each fuel hazard score on an ordinal categorical scale from 1 (low) to 5 (high). We predicted the probability of a hazard score of 4 or 5 (P4_5) for the elevated fuel stratum rather than the change in categories, since high scores in this stratum have shown to strongly affect fire behaviour (Hines et al., 2010).

Predictor impact assessment

Although the importance of each predictor was reported by the Gini index and increase of error, these metrics do not directly translate into the magnitude of changes in the probability of a particular future fuel hazard score. We therefore explored the impacts of two individual predictors by making projections with the target predictor from 2000-2015 instead of the projected predictor while holding all other predictors as projected (‘manipulated run’). For clarity, the projections with the full setup are referred to as ‘projections’. Comparing the ‘projections’ with ‘manipulated run’ allowed us to separate the contributions of each predictor.
The two individual predictors that were assessed were the rainfall seasonality and the optimal LAI. We assessed rainfall seasonality because it showed high variability in the future and ranked high in the importance list. The difference between ‘projection’ and ‘manipulated run’ is referred to as the ‘rainfall seasonality effect’.
The optimal LAI was investigated because it represents the ‘fertilisation effect’ of rising Ca, a key innovation of this study. The difference between ‘projections’ and ‘manipulated run’ is referred to as the ‘fertilisation effect’, hereafter. The fractional contribution of the ‘fertilisation effect’ was calculated as the ‘fertilisation effect’ divided by the ‘projection’.

High resolution projections

To explore model performance at fine spatial resolution, we focused on a topographically heterogeneous region of about 20 km2near Dargo, Victoria (-37.33588, -37.38554,147.27566,147.32566). This mountainous region features strong heterogeneity in environmental conditions as indicated by the fine-scale patterns of the selected terrain attributes (Figure S4). Models relying on climate only would predict no variation in fuel hazard scores due to the resolution of climate data (3.6-5.0 km). Here, we used topographic and soil data at 90 m resolution to demonstrate the capability of the model to predict fuel hazard scores at this fine resolution. We used climate data from the ACCESS 1-0 climate projection only, which was deemed sufficient for this analysis. because the goal is to highlight the capability of the model rather than predicting future change.