FIGURE 3 Flow chart for geostatistical analysis in this study
The semi-variogram is the fundamental tool of geostatistics to analyze whether spatial data is correlated and how far the correlation can be reliable. The common equation to calculate the semi-variogram model is by Matheron’s method of moments (MoM) prediction (Oliver and Webster 2015):
\(\hat{\gamma}(h)=\frac{1}{2m\left(h\right)}\sum_{i=1}^{m\left(h\right)}\left\{z\left(x_{i}\right)-z\left(x_{i}+h\right)\right\}^{2}\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (1)\)
where \(z(x_{i})\) and \(z(x_{i}+h)\) are the observed values of \(z\)at places \(x_{i}\) and \(x_{i}+h\) and \(m(h)\) is the number of paired comparisons at lag \(h\).
The semi-variogram model contains three popular models, such as the gaussian model, the spherical model, and the exponential model (Lloyd, 2010).
One of the conventional measures of model performance is the root-mean-square error (RMSE) (Chai & Draxler, 2014). RMSE is a method, which used to derive the information of the best model from various semi-variogram models. The most excellent semi-variogram model can be selected from the lowest RMSE value. If the estimations are perfect, then the RMSE value should be zero. The RMSE value is obtained from the cross-validation step of the Kriging results.
The ratio of nugget and to the sill or dependence is used to see how well the data variance correlated with distance. The classification of this ratio is as follows: if the ratio is ≤25%, the variable is classified as strongly spatially dependent, if the ratio is between 25% and 75%, the variable is classified as moderately spatially dependent, and if the ratio is ≥75% the variable is classified as weakly spatially dependent (Cambardella et al., 1994).
Based on RMSE calculation and emphasized by the ratio of the nugget to the sill, the best semi-variogram model is selected. A selected semi-variogram will be used to generate a kriging map of the water table.
Ordinary Kriging based on the assumption that variation is random and spatially dependent. Kriging predicts value within some distances from sparse sample data. The estimation equation is given by:
\begin{equation} \hat{z}\left(x_{0}\right)=\sum_{i=1}^{N}{\lambda_{i}z\left(x_{i}\right)}\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (2)\nonumber \\ \end{equation}
where \(\hat{z}\left(x_{0}\right)\) is the estimated value at the unsampled point \(x_{0}\), \(z(x_{i})\) is the observation data, \(N\)is the number of observations, and \(\lambda_{i}\) is the weights.
The next step is to find the weights that minimize the kriging variance conditional on the unbiasedness condition that the weights have to sum up 1,
\begin{equation} \sum_{i=1}^{N}{\lambda_{i}\gamma\left(x_{i}-x_{0}\right)+\psi\left(x_{0}\right)=\gamma\left(x_{j}-x_{0}\right)}\text{\ for\ all\ j}\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (3)\nonumber \\ \end{equation}\begin{equation} \sum_{i=1}^{N}{\lambda_{i}=1}\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (4)\nonumber \\ \end{equation}
Where \(\psi\left(x_{0}\right)\) is the Lagrange multiplier which will be utilized to reach minimization.
Cross-validation is a method to evaluate the quality of the kriging map in predicting the unknown values. The basic procedure of cross-validation is one by one estimation method called “leave-one-out.” The method is performed considering all the data, then removing them one by one, and estimating the removed one by the rest of the data. It compares the value of measured and predicted, where the difference is called a prediction error (ESRI, 2010). The calculation of the prediction errors is utilized as the assessment of the best model for map production. The model performance can be assessed using the following criteria:
The mean prediction errors (MPE) equation is given by:
\begin{equation} MPE=\frac{1}{n}\sum_{i=1}^{n}\left({\hat{z}}_{i}-z_{i}\right)\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (5)\nonumber \\ \end{equation}
Root-mean-square prediction errors (RMSE) equation is given by:
\begin{equation} RMSE=\sqrt{\frac{1}{n}\sum_{i=1}^{n}\left({\hat{z}}_{i}-z_{i}\right)^{2}}\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (6)\nonumber \\ \end{equation}
The average standard error (ASE),
\begin{equation} ASE=\sqrt{\frac{1}{n}\sum_{i=1}^{n}{\hat{\sigma}}_{i}}\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (7)\nonumber \\ \end{equation}
The root-mean-square standardized errors (RMSSE) equation is given by:
\begin{equation} RMSSE=\sqrt{\frac{1}{n}\sum_{i=1}^{n}\left\{\frac{\left({\hat{z}}_{i}-z_{i}\right)}{{\hat{\sigma}}_{i}}\right\}^{2}}\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (8)\nonumber \\ \end{equation}
where \(z_{i}\) is the observed data, \({\hat{z}}_{i}\) is the predicted data, \({\hat{\sigma}}_{i}\) is the prediction standard error for location \(i\), and \(n\) is the number of sampling points.
The range of values of these criteria are: