2.5. Model Validation and Performance Metrics
A nested operator used for validating the models in the current study is
the split validation operator. The operator consists of two
subprocesses: a training subsection and a test subsection. The training
subprocess is used to learn or construct a model. The learned model is
then employed in the subprocess testing. The model’s efficiency is also
evaluated during the test process. The input dataset is divided into two
sub-sets. One subset is used as a training set, the other as a test set.
The model is learned from the training set and adapted to the test set.
The learning process normally optimizes the model parameters so that the
model fits as well as possible into the training data. If we then take
an independent sample of test data, the model normally does not match
the test data and the training data. Split validation is a means of
predicting how a model fits into a hypothesis set if an explicit test
set is not available. The Split Validation operator also provides
instruction for one set of data and testing for another explicit test
set (17). The whole dataset was portioned into two subsets: %75 for
training and 25% for testing samples. All the models were trained on
the first dataset and tested on the second dataset. Afterward, the
performance metrics were calculated for each model by DTROC software
(18). The evaluation metrics were accuracy, precision, sensitivity,
specificity, F1-score, Matthews correlation coefficient (MCC), and
G-mean. A detailed explanation of the metrics is available from the
study (19).