Insert Figure 3
Deep learning models suffer from this due to their hidden learning behavior and up to billions of parameters. One recent study made this vividly clear. When researchers trained a model to distinguish COVID-19 patients with pneumonia from those with other respiratory diseases based on chest radiographs, the algorithm based its prediction on the printed dates on the radiological images; it found a shortcut and classified all patients dated since 2020 as COVID-19 cases. Thus, there is a growing demand for ‘white box’ approaches, referring to methods and models that are easy to explain and interpret. This need is further amplified when the aim is to bring applications to clinical practice, which has many technical, medical, legal, and ethical dimenions. The urgent need for explainability has accelerated methodological innovations to ‘open the black box’. Relevant examples are SHAP (SHapley Additive exPlanations), LIME (Local Interpretable Model-agnostic Explanations), and CAM (Class Activation Maps). For example, SHAP was recently implemented to describe the contribution of features selected for inclusion in asthma prediction models. These analytical methods calculate how each input feature contributes to each prediction, providing detailed insights into the learning patterns of the AI model.

Validation and generalizability

A structured modeling process is essential in developing an ML prediction model to create a reliable model and establish confidence in its outcomes. There are many ML algorithms, and it is difficult to tell which will perform best beforehand. This is called the no-free lunch theorem, which emphasizes the need to develop and evaluate ML models iteratively. Thus, multiple ML methods should be applied to the data and their performance compared. Figure 4 depicts the steps to build a supervised learning prediction model for disease risk. The steps needed for unsupervised learning overlap to a large extent. Skipping or mismanaging these steps poses a risk to model reliability, for example, by not properly separating the training and validation data, which may lead to overfitting of the prediction model.