INTRODUCTION
Prediction models also known as clinical prediction models are
mathematical formula or equation that expresses the relationship between
multiple variables and helps predict the future of an outcome using
specific values of certain variables. Prediction models are extensively
used in numerous areas including clinical settings and their application
is large. In clinical application, a prediction model helps to detect or
screen high-risk subjects for asymptomatic disease for early
interventions, predict a future disease to facilitate patient-doctor
communication based on more objective information, assist in medical
decision-making to help both doctors and patients to make an informed
choice regarding the treatment, and assist in healthcare services with
planning and quality management.
While specific details may vary between prediction models, the goal and
process of developing prediction models are mostly similar.
Conventionally, a single prediction model is built from a dataset of
individuals in whom the outcomes are known and then the developed model
is applied to predict outcomes for future individuals. There are two
main components of prediction modeling: model development and model
validation. Once a model is developed using an appropriate modeling
strategy, its utility is assessed through model validation.
Investigators want to see through validation how the developed model
works in a dataset that was not used to develop the model to ensure that
the model’s performance is adequate for the intended purpose.
Model validation provides a true test of a model’s predictive ability
when the model is applied on an independent data set. A model may show
outstanding predictive accuracy in a dataset that was used to develop
the model, but its predictive accuracy may decline radically when
applied to a different dataset. In the era of precision health where
disease prevention through early detection by monitoring health and
disease based on an individual’s risk is highly encouraged, accurate
prediction in model validation has become even more important for
successful screening.
There are numerous clinical prediction models available to serve
different purposes, however, only a few found their application in
clinical practice. One reason for that is lack of their validation,
particularly external validation. External validity establishes
generalizability of a prediction model. Generally, accuracy of a
prediction model degrades from the sample in which the model was first
developed to subsequent application. For a prediction model to be
generalizable, the accuracy of the model need to be both reproducible
and transportable. A prediction model that cannot predict outcomes
accurately in a new sample is useless. Clinicians did not find
confidence and trust to use prediction models in their practice that are
not well validated. Despite its importance being recognized, external
validation of prediction models is not common, which has largely
contributed to failure to translate prediction models into clinical
practice. Different clinical practice guidelines recommend incorporating
only those prediction models in clinical practice that has demonstrated
good predictive accuracy in multiple validation studies.
Model validation involves different aspects and our objective is to
discuss those aspects in this paper to provide the readers with a basic
understanding and importance of the topic. The concept of model
validation is statistical. However, we tried to present a nontechnical
discussion of the topic in plain language. The information provided in
this paper can be helpful for anyone who wishes to be better informed,
have more meaningful conversations with data analysts about their
project or apply the right model validation technique given that they
have advanced training in statistics. We have arranged our discussion as
follows. We begin the discussion with defining model validation. Then we
have outlined the major steps one needs to follow in model validation.
Within the model validation steps, we discussed different ways of model
validation together with their strengths and limitations which we named
“model validation procedures” and how to assess the performance of a
validated model which we named “model performance assessment”.