2 Methodology

2.1 Framework

In order to predict the hydrological response to climate change, four regional climate models (RCMs), i.e., RSMGE, HadGEM3_RA, RegCM4, and WRF, are used to drive hydrological models for future runoff forecasting. Climate projection results from CORDEX-East Asia, that were bias-corrected by linear regression, are used as inputs, and the runoff is used as the output of hydrological models. Hydrological predictions are carried out by using four learning methods, i.e., multiple linear regression (MLR), support vector machine (SVM), artificial neural network (ANN) and multilayer perceptron (MLP), which are trained by hydrological simulation. Critical forecasting results are daily and monthly runoffs during the period of 2021-2050 under two greenhouse gas emission scenarios, i.e., RCP4.5 and RCP8.5. The framework of this study is shown in Fig. 1.

2.2 Regional climate modelling

General circulation model (GCM) is a common and typical method for forecasting future climates factors. GCM can predict future climate in large-scale regions (around 1000 km) on a global scale. However, its forecasting scale is so large that the resolution is relatively inadequate to represent hydrological processes (Giorgi and Marinucci, 1996). Coordinated Regional Climate Downscaling Experiment (CORDEX) is a framework for the World Climate Research Program. This framework aims to assess simulated performances of regional climate models (RCMs) through a series of predictive experiments. Compared with GCMs, RCMs have higher resolutions (about 25-50 km) and can capture climate characteristics within regions. Therefore, RCMs can better meet the needs of hydrological forecasting. The RCMs this paper used are from the high-resolution CORDEX-East Asia project, namely RSMGE, HadGEM3_RA, RegCM4 and WRF. In order to study the runoff changes under different greenhouse gas emission scenarios, this paper selected two RCP scenarios, i.e., high-emission RCP8.5 scenarios and medium-emission RCP4.5 scenarios.
The simulations of climate variables may be full of uncertainties (Cheng et al., 2017; Wu et al., 2019). The biases in the outputs of RCMs are corrected by a simple and easy-to-operate method, i.e., linear regression. The specific steps of bias correction are: (1) The time overlap between the simulation results of climate models and the observed data is taken as the overall sample for bias correction. The first 2/3 of the sample is used for calibrating the bias-correction model, and the remaining 1/3 is used for verifying bias-correction accuracies; (2) The RCMs simulation data and observation data in the calibrated and verified samples are sorted in an ascending order according to the values. Establish linear regression model by disturbing sequences, to fit the relationship between simulated and observed climate data. (3) Bringing the simulation data by verification sample into the established bias-correction equation, the corrected climate data will be obtained; (4) the pre-correction and post-correction climate data are compared with the observed data respectively, to analyze the correction effect of the bias-correction model.
The mean absolute error (MAE) is used as the evaluation index for the bias-correction model. MAE is calculated as:
\(\text{MAE}=\frac{1}{n}\bullet\sum_{i=1}^{n}\left|y_{i}-y_{i}^{*}\right|\)(1)
where: \(y_{i}\) is the observed climate data, \(y_{i}^{*}\) is the simulated climate date, n is the sample size.

2.3 Deep learning

Among the existing runoff forecasting methods, MLR is widely used because of its simple principle and operation (Bauer and Curran, 2005). In addition to MLR, SVM and ANN as machine learning methods have also been successfully applied in several hydrological forecasting recently (Asefa et al., 2006; Lin et al., 2010; Pan et al., 2007; Leahy et al., 2008). SVM is a pattern recognition approach basing on statistical learning theory (Vapnik, 1995). Prediction error and structural complexity are simultaneously minimized in SVM. ANN is a mathematical model that simulates the processing mechanism of complex information in the human brain’s nervous system (Marcoulides, 2004). Therefore, MLR, SVM and ANN are also used for hydrological forecasting in this study for comparative analysis.
Deep learning are powerful tools in system simulation field, which are widely used in image recognition (Smirnov et al., 2014), big-data analytics (Wang et al., 2018). Compared to shallow machine learning, deep learning transforms the original data features layer by layer, and it has a more hierarchical learning of data features. MLP (Fig. 2) is one of the typical deep learning. MLP has strong learning and representation ability for nonlinear relationships among variables. In this study, MLP is selected to be applied in runoff forecasting, and its potential advantages in hydrological forecasting compared with traditional machine learning are explored. The number of neurons in the input and output layers are consistent with the number of input and output variables, respectively. And the number of neurons in the hidden layer is determined by parameter adjustment. In this study, the input layer of the MLP network contains 4 neurons, the output layer contains 1 neuron, and there are two hidden layers. Each hidden layer contains 64 nodes.
Commonly used transfer functions are sigmoid, tanh, relu, etc. Compared with other transfer functions, relu can effectively alleviate the gradient disappearance. The MLP in this study uses relu as the transfer function.
The transmission of information in MLP is as:
\(x_{\text{ij}}=f_{i}\left(W_{i}X_{i-1}+b_{i-1}\right)\) (2)
where \(x_{\text{ij}}\) is the output of layer i, node j, \(f_{i}\) is the transfer function at layer i, \(W_{i}\) is the weights between layer i-1 and layer i, \(X_{i-1}\) is the output of layer i-1, and\(b_{i-1}\) is the bias of layer i-1.

2.4 Hydrological simulation

Based on historical data, the correlation coefficient between each climatic factor and runoff is calculated (Table 1, Table 2). The climatic factors with strong correlation coefficient are selected as the inputs.
The daily results show that the effect of precipitation on runoff is stronger than temperature. Time lag between precipitation and river discharge is about two days. Temperature has weak effect on the runoff, and time lag is insignificant. Therefore, daily precipitation observed two days ago, one days ago and in the same day as daily runoff are chosen as inputs, and daily temperature observed in the same day as daily runoff is also chosen.
The monthly correlation results show that the correlation of precipitation on runoff is still slightly higher than temperature, and time lag of both is about one month. Therefore, the average monthly precipitation and temperature observed one month ago and in the same month are chosen as inputs, in the monthly runoff forecasting.
In this study, Pearson correlation coefficient (ρ), Spearman correlation coefficient (ρs), root mean square error (RMSE), Nash coefficient (Nash) and relative square root error (RRSE) are used to assess accuracy of model simulation. The calculation formula is as:
\(\rho=\frac{\sum_{i=1}^{n}{\left(y_{i}-\overset{\overline{}}{y}\right)\left(y_{i}^{*}-\overset{\overline{}}{y^{*}}\right)}}{\sqrt{\sum_{i=1}^{n}\left(y_{i}-\overset{\overline{}}{y}\right)^{2}}\bullet\sqrt{\sum_{i=1}^{n}\left(y_{i}^{*}-\overset{\overline{}}{y^{*}}\right)^{2}}}\)(3)
\(\rho_{s}=1-\frac{6\sum_{i=1}^{n}d_{i}^{2}}{n\left(n^{2}-1\right)}\)(4)
\(RMSE=\sqrt{\frac{\sum_{i=1}^{n}\left(y_{i}-y_{i}^{*}\right)^{2}}{n}}\)(5)
\(Nash=1-\frac{\sum_{i=1}^{n}\left(y_{i}-y_{i}^{*}\right)^{2}}{\sum_{i=1}^{n}\left(y_{i}-\overset{\overline{}}{y}\right)^{2}}\)(6)
\(RRSE=\sqrt{\frac{\sum_{i=1}^{n}\left(y_{i}-y_{i}^{*}\right)^{2}}{\sum_{i=1}^{n}\left(y_{i}-\overset{\overline{}}{y}\right)^{2}}}\)(7)
where: \(y_{i}\) is the observed runoff, \(y_{i}^{*}\) is the modelled runoff, \(\overset{\overline{}}{y}\) is the average of observed runoff,\(\overset{\overline{}}{y^{*}}\) is the average of modelled runoff;\(d_{i}\) is the grade difference between observed runoff and modelled runoff; n is the sample size.
ρ and ρs reflect the strength of correlation between simulated value and observed value. The closer their value is to 1, the stronger the correlation is. RMSE reflects the error of the simulated value, that is, the magnitude of deviation. The closer its value is to 0, the more accurate the simulation is. Nash and RRSE reflect the prediction error. Nash approximates to 1 and RRSE approximates to 0, showing the error is minimized.