Prediction Model Development and Validation
For model development, the classification models explored included
decision forest, decision jungle, boosted decision tree, neural network,
and logistic regression. Following development of several models using
the above classifications, each model was applied to the test data set
where performance metrics were measured. The performance scores of each
model are displayed in Table 2. The decision forest classification was
the most robust with an AUC of 0.78 (95% CI, 0.77 to 0.79), accuracy of
71%, precision of 72%, and recall of 71%, with the highest scoring in
two out of the four measured criteria. This was followed closely by the
decision jungle classification. The remaining models were noted to have
declining recall performance. The ideal parameter determined by the
model was minimum of 4 sample per leaf node, 128 random splits per node,
maximum of 64 for depth of decision tree, and limitation to 32 different
decision trees.