2.5. Model Validation and Performance Metrics
A nested operator used for validating the models in the current study is the split validation operator. The operator consists of two subprocesses: a training subsection and a test subsection. The training subprocess is used to learn or construct a model. The learned model is then employed in the subprocess testing. The model’s efficiency is also evaluated during the test process. The input dataset is divided into two sub-sets. One subset is used as a training set, the other as a test set. The model is learned from the training set and adapted to the test set. The learning process normally optimizes the model parameters so that the model fits as well as possible into the training data. If we then take an independent sample of test data, the model normally does not match the test data and the training data. Split validation is a means of predicting how a model fits into a hypothesis set if an explicit test set is not available. The Split Validation operator also provides instruction for one set of data and testing for another explicit test set (17). The whole dataset was portioned into two subsets: %75 for training and 25% for testing samples. All the models were trained on the first dataset and tested on the second dataset. Afterward, the performance metrics were calculated for each model by DTROC software (18). The evaluation metrics were accuracy, precision, sensitivity, specificity, F1-score, Matthews correlation coefficient (MCC), and G-mean. A detailed explanation of the metrics is available from the study (19).