Discussion:
In this study, machine learning was used to create a model to predict primary treatment modality for OPSCC. Logistic regression, a more traditional statistical methodology, was employed as a reference as well. The results indicate that machine learning was able to create a robust prediction model using the variables included in the NCDB. Furthermore, the results of this study highlight that the variables most predictive of primary treatment modality are Clinical T- classification and N- classification, primary site of tumor, and type of institution where treatment is performed.
In light of the lack of sound evidence dictating optimal primary treatment modality for early-stage OPSCC, this study provides further insight about participating institutions across the nation. This study indicates that the decision to undergo primary surgery versus primary radiation is most strongly influenced by tumor characteristics, and with some influence from facility type. Our model did not find that geographical region was an important variable used to predict primary treatment modality. Previous work has demonstrated marked regional variation in pursing primary treatment with surgery for early-stage I or II cancers. This previous study found the highest surgery rates in the West North Central region, and lowest in the New England region.15
Machine learning is emerging in the medical literature as a novel, sophisticated methodology for predicting clinical features of interest. The advantages of ML are in its ability to process large data input and account for high levels of variability, nonlinear interactions, and heterogeneous distributions.13,16 Use of this technology is widely used commercially, with large companies such as Netflix utilizing ML to better cater to its clientele.17 In the medical literature, machine learning is being applied in both the clinical and basic science fields from medical imaging to genomic sequencing to predicting clinical outcomes.13,18,19 An area where ML may be of additional value is in the analysis of large clinical data registries. Multi-institutional, national, and international data registries allow higher statistical power and open doors to address previously difficult-to-answer questions.20 Due to the aforementioned strengths, a model developed using ML would be able to account for intricate interactions among variables.26
Machine learning analysis exists on a spectrum ranging from highly supervised models where all input variables and their relationships along with the desired output variable are selected by the operator to unsupervised models where the ML algorithm attempts to identify patterns of structure in unlabeled data with minimal operator input.12 The results of this study indicate that a form of supervised learning, decision forest, yielded the strongest model. We can ascertain from this result and our previous understanding of ML that the appropriate form of data analysis is determined by the clinical question asked and the relationship between input and output variables.21 In situations where linear problems are explored, linear or logistic regression will likely outperform ML. However, in data where nonlinear relationships and interaction terms are explored, ML will likely outperform more traditional statistics.8,12,21
There are several limitations that warrant discussion. To begin, this study relied on data collected from the NCDB for the development of the model. Previous studies have described the limitations using this large national registry.22,23 Briefly, these shortcomings include incomplete patient and treatment attributes collection in the registry, and significant changes within the past decade that affect the completeness of available data. In our study, the availability of HPV data is one such example. Until recently, HPV status was not routinely collected. The HPV status for all cases in our study prior to 2010 are unknown. The decision to include those with missing HPV status was made due to NCCN guidelines for treatment of early stage-OPSCC to be primary surgery or primary radiation regardless of HPV status. However, a subanalysis was performed, which demonstrated that the same variables affect the primary treatment modality patients receive.
The final limitation for discussion is directed toward ML. In the creation of any machine learning algorithm, the process of how an algorithm determines its prediction is not available for review. This is known as the “black box” of machine learning. That is, information is input into the data and a prediction is generated, but any attempts to analyze what the impact of individual variables through an effect size or the relationship among variables is not able to be displayed in a comprehensible format.12
In an attempt to understand how a machine learning model produces its model, PFI scores are calculated to assess the impact of individual variables. However, interpretation of the scores is challenging. The definition of a PFI score is the absolute difference in AUC of the final model before and after altering an individual variable. Given that this is a novel metric, it is unclear what the significance of the produced value is. Furthermore, it is unknown how to compare one score to another. While two variables may have a PFI score difference of 0.01, how significant of a difference this is not understood. Therefore, this presents an additional limitation to machine learning studies as conclusions regarding individual input variables are limited to ranking. These limitations will be improved upon in the future as more investigation into prognostic patient or tumor characteristics are identified, and further work into understanding machine learning are undertaken.