Note: sd=standard deviation, se=standard error, skew=skewness

Software and code

To maintain the reproducibility of this work we have made the code and dataset available on Github (www.github.com/dasaptaerwin). We used command line statistical program R and R Studio for the analysis. R is open source and available in Linux, Mac and Windows operating system. The program can be downloaded from (https://cran.r-project.org and http://rstudio.com). We also used some packages: rpart for regression tree analysis, gam for generalised additive modeling, and vegan for data clustering and principal component analysis (PCA). All packages are downloadable for free from CRAN server (https://cran.r-project.org).

Principal component analysis

Principal component analysis (PCA) is another form of unsupervised classification method applying the rotation of covariance matrix. It is also a dimension-reduction tool that can be used to reduce a large set of variables to a small set that still contains most of the information in the large set. The result is spatial plot of samples and variables based on the eigenvalues (Bhardwaj, Singh, and Singh 2010; Candès et al. 2011; Stacklies et al. 2007).
Both techniques have been successfully used to classify water chemistry Fitzpatrick, Long, and Pijanowski 2007; D. E. . Irawan et al. 2009; King, Raiber, and Cox 2014; Ritzi et al. 1993; Seyhan, Griend, and Engelen 1985). All examples used both techniques together to explain the correlation between hydrochemical state and geological setting.
Before applying the PCA, we tested the collinearity of the data structure using multiple regression technique (Figure 2). The regression tree uses the following equations of standard linear model, polynomial regression, GLM (generalised linear model), and GAM (generalised additive model) (Gio, Prana Ugiana and Rosmaini, Elly 2015).

Standard linear model

\(y\ \sim\ \left(\mu,\sigma^{2}\right)\) Equation
\(\mu=\ b_{0}+\ b_{1}\ \chi_{1}\) Equation 1

Polynomial regression

\(y\ \sim\ \left(\mu,\sigma^{2}\right)\) Equation 2
\(\mu=\ b_{0}+\ b_{1}\ \chi_{1}+\ b_{2}\ \chi^{2}\) Equation 3

GAM equation

\(y\ \sim\ \left(\mu,\sigma^{2}\right)\) Equation 4
\(g(\mu)=\ f(\chi)\) Equation 5