Omar Karadaghy - Authorea

Objective: (1) Demonstrate how machine learning can be used for prediction modeling by predicting the treatment patients with T1-2, N0-N1 Oropharyngeal Squamous Cell Carcinoma receive. (2) Assess disparities in the treatment of this population. Design: Retrospective cohort. The data was split into 80/20 distribution for training and testing, respectively. Machine learning algorithms were explored for development. Area Under the Curve, accuracy, precision, and recall were calculated for the final model. The permutation feature scores highlight significant variables within the model. Setting: National Cancer Database. Participants: Adults diagnosed with T1-2, N0-N1 Oropharyngeal Squamous Cell Carcinoma from 2004 to 2013 were eligible Main Outcome Measure: Primary treatment modality Results: Among the 19,111 patients in the study, the mean (standard deviation) age was 61.3 (10.8) years, 14,034 (73%) were male, and 17,292 (91%) were white. Surgery was the primary treatment in 9,533 (50%) cases, and radiation in 9,578 (50%) cases. The final model yielded an Area Under the Curve of 78% (95% CI, 77% to 79%), accuracy of 71%, precision of 72%, and recall of 71%. The T-stage, primary site, N-stage, grade, and type of treatment facility were impactful variables included in the model. Conclusion: Machine learning was used to predict primary treatment modality for T1-2, N0-N1 Oropharyngeal Squamous Cell Carcinoma. This study demonstrates how machine learning can be used for prediction modeling. The results also suggest treatment is influenced by clinical staging and type of treatment facility.