FIGURE 3 Flow chart for geostatistical analysis in this study
The semi-variogram is the fundamental tool of geostatistics to analyze
whether spatial data is correlated and how far the correlation can be
reliable. The common equation to calculate the semi-variogram model is
by Matheron’s method of moments (MoM) prediction (Oliver and Webster
2015):
\(\hat{\gamma}(h)=\frac{1}{2m\left(h\right)}\sum_{i=1}^{m\left(h\right)}\left\{z\left(x_{i}\right)-z\left(x_{i}+h\right)\right\}^{2}\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (1)\)
where \(z(x_{i})\) and \(z(x_{i}+h)\) are the observed values of \(z\)at places \(x_{i}\) and \(x_{i}+h\) and \(m(h)\) is the number of
paired comparisons at lag \(h\).
The semi-variogram model contains three popular models, such as the
gaussian model, the spherical model, and the exponential model (Lloyd,
2010).
One of the conventional measures of model performance is the
root-mean-square error (RMSE) (Chai & Draxler, 2014). RMSE is a method,
which used to derive the information of the best model from various
semi-variogram models. The most excellent semi-variogram model can be
selected from the lowest RMSE value. If the estimations are perfect,
then the RMSE value should be zero. The RMSE value is obtained from the
cross-validation step of the Kriging results.
The ratio of nugget and to the sill or dependence is used to see how
well the data variance correlated with distance. The classification of
this ratio is as follows: if the ratio is ≤25%, the variable is
classified as strongly spatially dependent, if the ratio is between 25%
and 75%, the variable is classified as moderately spatially dependent,
and if the ratio is ≥75% the variable is classified as weakly spatially
dependent (Cambardella et al., 1994).
Based on RMSE calculation and emphasized by the ratio of the nugget to
the sill, the best semi-variogram model is selected. A selected
semi-variogram will be used to generate a kriging map of the water
table.
Ordinary Kriging based on the assumption that variation is random and
spatially dependent. Kriging predicts value within some distances from
sparse sample data. The estimation equation is given by:
\begin{equation}
\hat{z}\left(x_{0}\right)=\sum_{i=1}^{N}{\lambda_{i}z\left(x_{i}\right)}\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (2)\nonumber \\
\end{equation}
where \(\hat{z}\left(x_{0}\right)\) is the estimated value at the
unsampled point \(x_{0}\), \(z(x_{i})\) is the observation data, \(N\)is the number of observations, and \(\lambda_{i}\) is the weights.
The next step is to find the weights that minimize the kriging variance
conditional on the unbiasedness condition that the weights have to sum
up 1,
\begin{equation}
\sum_{i=1}^{N}{\lambda_{i}\gamma\left(x_{i}-x_{0}\right)+\psi\left(x_{0}\right)=\gamma\left(x_{j}-x_{0}\right)}\text{\ for\ all\ j}\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (3)\nonumber \\
\end{equation}\begin{equation}
\sum_{i=1}^{N}{\lambda_{i}=1}\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (4)\nonumber \\
\end{equation}
Where \(\psi\left(x_{0}\right)\) is the Lagrange multiplier which will
be utilized to reach minimization.
Cross-validation is a method to evaluate the quality of the kriging map
in predicting the unknown values. The basic procedure of
cross-validation is one by one estimation method called
“leave-one-out.” The method is performed considering all the data,
then removing them one by one, and estimating the removed one by the
rest of the data. It compares the value of measured and predicted, where
the difference is called a prediction error (ESRI, 2010). The
calculation of the prediction errors is utilized as the assessment of
the best model for map production. The model performance can be assessed
using the following criteria:
The mean prediction errors (MPE) equation is given by:
\begin{equation}
MPE=\frac{1}{n}\sum_{i=1}^{n}\left({\hat{z}}_{i}-z_{i}\right)\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (5)\nonumber \\
\end{equation}
Root-mean-square prediction errors (RMSE) equation is given by:
\begin{equation}
RMSE=\sqrt{\frac{1}{n}\sum_{i=1}^{n}\left({\hat{z}}_{i}-z_{i}\right)^{2}}\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (6)\nonumber \\
\end{equation}
The average standard error (ASE),
\begin{equation}
ASE=\sqrt{\frac{1}{n}\sum_{i=1}^{n}{\hat{\sigma}}_{i}}\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (7)\nonumber \\
\end{equation}
The root-mean-square standardized errors (RMSSE) equation is given by:
\begin{equation}
RMSSE=\sqrt{\frac{1}{n}\sum_{i=1}^{n}\left\{\frac{\left({\hat{z}}_{i}-z_{i}\right)}{{\hat{\sigma}}_{i}}\right\}^{2}}\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (8)\nonumber \\
\end{equation}
where \(z_{i}\) is the observed data, \({\hat{z}}_{i}\) is the predicted
data, \({\hat{\sigma}}_{i}\) is the prediction standard error for
location \(i\), and \(n\) is the number of sampling points.
The range of values of these criteria are:
- mean prediction error value close to 0, indicates the predictions are
unbiased,
- root-mean-square standardized error prediction value is close to 1,
indicates the standard errors are accurate,
- root-mean-square error and average standard error need to be as small
as possible, so the predictions do not deviate too much from the
measure values,
- on the QQ plot, using the root-mean-square standardized error (RMSSE),
the values must be as close as possible with the 45 degrees straight
line.