Prediction Model Development and Validation
For model development, the classification models explored included decision forest, decision jungle, boosted decision tree, neural network, and logistic regression. Following development of several models using the above classifications, each model was applied to the test data set where performance metrics were measured. The performance scores of each model are displayed in Table 2. The decision forest classification was the most robust with an AUC of 0.78 (95% CI, 0.77 to 0.79), accuracy of 71%, precision of 72%, and recall of 71%, with the highest scoring in two out of the four measured criteria. This was followed closely by the decision jungle classification. The remaining models were noted to have declining recall performance. The ideal parameter determined by the model was minimum of 4 sample per leaf node, 128 random splits per node, maximum of 64 for depth of decision tree, and limitation to 32 different decision trees.