Results

Annual data trends following Chiew (2006)

The relationship between annual rainfall and runoff for the catchments (Figure 3), generally plots below the 1:1 line. Ignoring storage variations, the divergence of the 1:1 line is related to the removal of water due to evapotranspiration. Except for some single points all the data plot below the 1:1 line for the gridded rainfall data. For the station data this is not the case for a few of the catchments (supplementary data on zenodo https://doi.org/10.5281/zenodo.3757041), but the results generally show similar behaviour. One of the catchments, where a lot of the data plotted above the 1:1 line using the observed station rainfall data, is a small catchment (SOUT). Two other catchments (NIVE and HELL) occur in areas with steep elevation gradients, and as a result the nearest rainfall station might not be representative. In contrast the ANUCLIM gridded data appears more representative, as less of the data plot above the 1:1 line.
Non-parametric elasticities (εp, red triangles in Figure 4), calculated from the streamflow and gridded rainfall data, were in the same range as those observed by Chiew (2006). All elasticities were less than 3, with most about 2 or above, and some stations indicating elasticities very close to 1 (COCH, HELL and SOUT).
In contrast, elasticities calculated from the simulated data based on the rainfall runoff models were quite different (boxplots and large blue dots indicating mean values in Figure 4), and in fact almost constant. The variation between the 10 replications of the rainfall runoff model calibration was very narrow, particularly for GR4J, so most of the boxplots look like single straight horizontal lines. Overall the elasticities calculated from the rainfall runoff modelling results were close or smaller than 1. This suggest decreases in runoff that are smaller than the associated decreases in rainfall, considering changes in evapotranspiration.
The calibration of the rainfall models with the gridded rainfall data was generally good (NSE > 0.5) considering the 41 year data period, although the SimHyd performance was lower than the GR4J performance (Figure 5), and some catchments had low model performance (Chiew et al., 2009). In addition, the different replications of the calibration with SimHyd showed considerable variation (wide boxplots), while the replications in the GR4J calibrations were very consistent (resulting in the boxplots being plotted as single lines). Comparing the two rainfall sources. the calibration performance using the observed station rainfall data (document 6 supplementary data) was slightly worse for both models and the calculated elasticities were also slightly lower (document 6 supplementary data).

Mann-Kendall and LTPMK tests

The results of the standard Mann-Kendall tests on the de-seasonalised weekly data indicate that there is a consistent decreasing trend in streamflow (Table 3), with a significant (p-value < 0.05) trend for nine of the 13 sites, which are all located in south of the continent and Western Australia. At least five and maybe seven of these sites with significant trends had trends outside the bootstrap distribution (Figure 6).
There is a matching decreasing trend in most of the gridded rainfall records, with a significant (p-value < 0.05) trend for 8 sites. However, only 2 of these are outside the bootstrap distribution (supplementary material document 2). The station rainfall data had similar results to the gridded data. Basically, all the 13 de-seasonalised weekly average maximum temperature records have a significant increasing Mann Kendall trend, even though only 5 of these are falling outside the bootstrap distribution (supplementary material document 2), with locations in south eastern Australia.
In contrast, all the LTPMK results indicated highly significant Hurst coefficients and therefore, based on this analysis, the Man Kendall trends were all not significant for all three variables under the scaling hypothesis (Hamed, 2008). To check the overall test, we also ran the monthly and annual summaries of the three variables. The results (supplementary material document 2A) show that the significance of the LTPMK trends increases with an increase in the time aggregation. For example, the LTPMK average maximum temperature was significant and increasing for six stations at the monthly time scale, and significant and increasing for eight of the 13 stations at the annual time scale. Similar results were found with the time integration for the rainfall and streamflow data, but in this case all significant trends were decreasing in time.

Generalized additive mixed modelling

The results of the generalised least squares (GLS) modelling (model 1 in Table 2) suggest only four of the studied catchments have significant linear trends in the streamflow over the last 41 years (Table 4), which is fewer than in the earlier Mann Kendall results (Table 3 and Figure 6). The negative trends were very small in actual percentage value, in the order of 10-4 % of the streamflow (Table 4). Similarly, significant trends in rainfall were also all negative, and very small in actual percentage. For those stations that have significant trends in both rainfall and streamflow, “amplification” (calculated as the trend in Q/trend in P) between 0.95 and 4.29 occurs. This is a larger trend then calculated from the original data and the rainfall-runoff modelling in Figure 3. However, the trend in streamflow derived in the GLS analysis still incorporates the changes in climate in the streamflow, such as changes in rainfall and temperature. These need to be removed to identify the true “amplification” effect.
Comparing the streamflow model results without rainfall (model 2, Table 4) with the results of the model that removes the rainfall effect (model 3, Table 5), indicates a small reduction in the negative trend and the same number of significant stations. Overall the inclusion of rainfall in the model removes approximately < 10 to 30% of the original trends. In other words, the linear trend is the remaining trend in the streamflow after removing the trend in the rainfall. In terms of amplification, this is the additional decrease in streamflow on top of any reduction in the rainfall. However, some of the remaining trends could be due to trends in temperature affecting potential ET.
Overall variation explained by the final models that include both rainfall and evapotranspiration (model 4, Table 2 and Table 6) was low to medium (-0.03 < Adj r 2 < 0.43, supplementary material document 3C). The worst performing model was for station NIVE, but more generally these results suggest that there are other processes causing variation in the runoff that the statistical model does not include. This is not necessarily an issue, as the goal of the statistical modelling was to explain the maximum variation in the streamflow related to rainfall and ET, and not to find the best predictive model.
After rainfall and evapotranspiration effects are accounted for (model 4), the models identify significant trends in the weekly data in only six catchments (Table 6). These trends are once again very small and only in the order of 1.1 – 5.5 × 10-4 % change. This suggests that the overall streamflow is indeed declining over time, even after accounting for changes in rainfall and evapotranspiration, but currently the overall change is very small. This remaining trend is the amplification.
Note that inclusion of the maximum temperature (evaporation) explains very little of the variation in streamflow, as the improvement in the performance measure AIC (Aikaike Information Criterium) between model 3 (Table 5) and model 4 (Table 6) is small. As a result, for those catchments still showing significant trends, the variance explained by including rainfall and temperature is small, from < 10% to 30%. As with the previous analyses (Chiew et al., 2009; Potter & Chiew, 2011), the catchments with significant trends are all located in the South and West of Australia.
For the Mann-Kendall tests on the remaining residuals, eight catchments have significant trends (Table 6, Figure 7), again more than with the regression modelling. However, the bootstrap results in Figure 7 suggest that even fewer of the sites have true trends, with a minimum of one and a maximum of three of the residual trends clearly falling outside the bootstrap distribution, including some on the outer edge. The overlap between the results from the Mann-Kendall residual analysis and the trend in the GAMM is clear for most of the locations, except for the very small YARR, for which the residual analysis suggests there is no significant trend, but the GAMM suggests the linear trend is significant. In this case, the actual trend might not be linear, but in general the assumption of a linear trend is not influencing the results. Moreover, there is clear overlap between the LTPMK analysis and the traditional Mann-Kendall analysis, with the same stations showing significance, despite the Hurst coefficient being significant for all the stations.

Model non-stationarity over long periods

This part of the analysis tested whether numerical models used for the analysis of trends and elasticities impact the analysis of the elasticities presented earlier. This is important as several climate change studies are based on model output analysis (Chiew et al., 2009; Potter & Chiew, 2011; Vaze et al., 2011) rather than observed data. As calibrated models essentially fit a stationary series, the residuals (between observed and simulated) should retain any existing trend in the data. This builds on earlier analyses (Buzacott, Tran, van Ogtrop, & Vervoort, 2019; Saft et al., 2015; Vaze et al., 2010), which contrasted calibration differences in dry and wet periods to investigate non-stationarity.
The result of the Mann-Kendall non-parametric analysis on the weekly residuals (observed – predicted) for the 41 years of the data series indicates significant slopes for only part of the stations (Figure 8). While there appears to be no consistent pattern, it indicates that for SimHyd (at least for some of the stations) there are stronger decreasing trends than for GR4J. In contrast, the residuals of GR4J had more significant trends in the standard Mann Kendall analysis, but also predicted some (very small) positive trends. Overall the identified catchments with significant slopes, CORA COTT, RUTH, SOUT and MURR, match the GAMM and earlier Mann-Kendall analysis on the data. Again, the small YARR catchment is an outlier, indicating a significant slope in the Mann-Kendall analysis, but positive for the GR4J residuals and negative for the SimHyd residuals. However, in both cases the actual slope value is very close to 0. Overall this matches the uncertainty for this catchment indicated in the earlier analyses.
The LTPMK tests however indicated that none of the identified trends were significant under this test (supplementary documents, document 6A). This essentially shows that identified slope in the model residuals exists locally, but currently the overall variation in the data means we cannot yet affirm a long-term trend in the residuals under the scaling hypothesis (Hamed, 2008).
Overall, these results are broadly similar to the earlier analyses, suggesting trends in only some of the South-eastern Australian stations, but with many of the trends being very small. All analyses also indicate the largest negative trend for the RUTH catchment.