Discussion:
In this study, machine learning was used to create a model to predict
primary treatment modality for OPSCC. Logistic regression, a more
traditional statistical methodology, was employed as a reference as
well. The results indicate that machine learning was able to create a
robust prediction model using the variables included in the NCDB.
Furthermore, the results of this study highlight that the variables most
predictive of primary treatment modality are Clinical T- classification
and N- classification, primary site of tumor, and type of institution
where treatment is performed.
In light of the lack of sound evidence dictating optimal primary
treatment modality for early-stage OPSCC, this study provides further
insight about participating institutions across the nation. This study
indicates that the decision to undergo primary surgery versus primary
radiation is most strongly influenced by tumor characteristics, and with
some influence from facility type. Our model did not find that
geographical region was an important variable used to predict primary
treatment modality. Previous work has demonstrated marked regional
variation in pursing primary treatment with surgery for early-stage I or
II cancers. This previous study found the highest surgery rates in the
West North Central region, and lowest in the New England
region.15
Machine learning is emerging in the medical literature as a novel,
sophisticated methodology for predicting clinical features of interest.
The advantages of ML are in its ability to process large data input and
account for high levels of variability, nonlinear interactions, and
heterogeneous distributions.13,16 Use of this
technology is widely used commercially, with large companies such as
Netflix utilizing ML to better cater to its
clientele.17 In the medical literature, machine
learning is being applied in both the clinical and basic science fields
from medical imaging to genomic sequencing to predicting clinical
outcomes.13,18,19 An area where ML may be of
additional value is in the analysis of large clinical data registries.
Multi-institutional, national, and international data registries allow
higher statistical power and open doors to address previously
difficult-to-answer questions.20 Due to the
aforementioned strengths, a model developed using ML would be able to
account for intricate interactions among variables.26
Machine learning analysis exists on a spectrum ranging from highly
supervised models where all input variables and their relationships
along with the desired output variable are selected by the operator to
unsupervised models where the ML algorithm attempts to identify patterns
of structure in unlabeled data with minimal operator
input.12 The results of this study indicate that a
form of supervised learning, decision forest, yielded the strongest
model. We can ascertain from this result and our previous understanding
of ML that the appropriate form of data analysis is determined by the
clinical question asked and the relationship between input and output
variables.21 In situations where linear problems are
explored, linear or logistic regression will likely outperform ML.
However, in data where nonlinear relationships and interaction terms are
explored, ML will likely outperform more traditional
statistics.8,12,21
There are several limitations that warrant discussion. To begin, this
study relied on data collected from the NCDB for the development of the
model. Previous studies have described the limitations using this large
national registry.22,23 Briefly, these shortcomings
include incomplete patient and treatment attributes collection in the
registry, and significant changes within the past decade that affect the
completeness of available data. In our study, the availability of HPV
data is one such example. Until recently, HPV status was not routinely
collected. The HPV status for all cases in our study prior to 2010 are
unknown. The decision to include those with missing HPV status was made
due to NCCN guidelines for treatment of early stage-OPSCC to be primary
surgery or primary radiation regardless of HPV status. However, a
subanalysis was performed, which demonstrated that the same variables
affect the primary treatment modality patients receive.
The final limitation for discussion is directed toward ML. In the
creation of any machine learning algorithm, the process of how an
algorithm determines its prediction is not available for review. This is
known as the “black box” of machine learning. That is, information is
input into the data and a prediction is generated, but any attempts to
analyze what the impact of individual variables through an effect size
or the relationship among variables is not able to be displayed in a
comprehensible format.12
In an attempt to understand how a machine learning model produces its
model, PFI scores are calculated to assess the impact of individual
variables. However, interpretation of the scores is challenging. The
definition of a PFI score is the absolute difference in AUC of the final
model before and after altering an individual variable. Given that this
is a novel metric, it is unclear what the significance of the produced
value is. Furthermore, it is unknown how to compare one score to
another. While two variables may have a PFI score difference of 0.01,
how significant of a difference this is not understood. Therefore, this
presents an additional limitation to machine learning studies as
conclusions regarding individual input variables are limited to ranking.
These limitations will be improved upon in the future as more
investigation into prognostic patient or tumor characteristics are
identified, and further work into understanding machine learning are
undertaken.