Note: sd=standard deviation, se=standard error, skew=skewness
Software and code
To maintain the reproducibility of this work we have made the code and
dataset available on Github
(www.github.com/dasaptaerwin).
We used command line statistical program R and R Studio for the
analysis. R is open source and available in Linux, Mac and Windows
operating system. The program can be downloaded from
(https://cran.r-project.org
and http://rstudio.com). We also used some packages: rpart for
regression tree analysis, gam for generalised additive modeling, and
vegan for data clustering and principal component analysis (PCA). All
packages are downloadable for free from CRAN server
(https://cran.r-project.org).
Principal component
analysis
Principal component analysis (PCA) is another form of unsupervised
classification method applying the rotation of covariance matrix. It is
also a dimension-reduction tool that can be used to reduce a large set
of variables to a small set that still contains most of the information
in the large set. The result is spatial plot of samples and variables
based on the eigenvalues (Bhardwaj, Singh, and Singh 2010; Candès et al.
2011; Stacklies et al. 2007).
Both techniques have been successfully used to classify water chemistry
Fitzpatrick, Long, and Pijanowski 2007; D. E. . Irawan et al. 2009;
King, Raiber, and Cox 2014; Ritzi et al. 1993; Seyhan, Griend, and
Engelen 1985). All examples used both techniques together to explain the
correlation between hydrochemical state and geological setting.
Before applying the PCA, we tested the collinearity of the data
structure using multiple regression technique (Figure 2). The regression
tree uses the following equations of standard linear model, polynomial
regression, GLM (generalised linear model), and GAM (generalised
additive model) (Gio, Prana Ugiana and Rosmaini, Elly 2015).
Standard linear model
\(y\ \sim\ \left(\mu,\sigma^{2}\right)\) Equation
\(\mu=\ b_{0}+\ b_{1}\ \chi_{1}\) Equation 1
Polynomial regression
\(y\ \sim\ \left(\mu,\sigma^{2}\right)\) Equation 2
\(\mu=\ b_{0}+\ b_{1}\ \chi_{1}+\ b_{2}\ \chi^{2}\) Equation 3
GAM equation
\(y\ \sim\ \left(\mu,\sigma^{2}\right)\) Equation 4
\(g(\mu)=\ f(\chi)\) Equation 5