Data division
The randomized data was grouped into training, testing and validation
datasets. Training dataset was used to arrive at potentially predictive
relations. It is a set of examples employed for learning, that is, to
fit the parameters (that is, weights) of the classifier. A test dataset
was used to evaluate the strength and utility of a predictive
relationship, it is a set of examples used only to measure the
performance (generalization) of a fully-specified classifier. So as to
avoid overfitting, it was imperative to have a validation set in
addition to training and testing sets.