Interpretability analysis
Logistic regression classifier is linear and thus not able to model possible non-monotonic relations between predictors and outcome. Random forest and gradient boosting classifiers are able model complex, non-monotonous relations, but they are so called black box models which means non-interpretable classifiers. Relations between inputs and output are difficult to understand directly from the parameters or structure of trained model. Hence, SHAP values and PDP plots were used to conduct post-hoc interpretability analysis for the random forest classifier. SHAP values enable to calculate exactly for the tree classifiers (such as random forest) by using the mature treeSHAP method17.
We performed data flow (Figure S1B) to train the random forest classifier and calculated SHAP values of the variables collected from the baseline visit until 6 months after the baseline ESS. Figure 3 shows variables sorted by the highest sum of absolute SHAP values over all patients. The distributions of the data points on the plots show the impacts of each variable for the classifier output. We detected that high number of visits after baseline ESS and short time between baseline visit and baseline ESS both increased the revision ESS risk. In addition, CRSwNP, asthma and NERD increased revision ESS risk. SHAP values show that the age of patients and the visit frequency from baseline visit to baseline ESS affected revision ESS risk in a non-monotonic way. That is, the red values (the higher than the average values) of these variables are dispersed on both sides of the scale (Figure 3).
We formed PDP plots of the ten variables with the highest SHAP values. The plots of the following variables showed a large risk score scale for a revision ESS: the number of visits 6 (or 3) months after the baseline ESS, the time between baseline visit and baseline ESS, age, the number of visits between baseline visit and baseline ESS, CRSwNP and asthma. The average predicted risk score varied more than .02 units between the low and high value of these predictors, whereas for the other predictors the PDP risk score varied less than .02 units (Figure S4). The PDP plot of the number of visits 6 months after the baseline ESS, showed a large scale of the risk score ranging from value of .1 for patients with less than two visits after baseline ESS, up to a value of about .35 for patients with more than seven visits (Figure S4A). Similarly, if patient had two or more postoperative visits within the 3 months, the risk score for revision ESS increased (Figure S4D). The plot of the time between baseline visit and baseline ESS showed a sharp drop of the risk score after about 100 days (Figure S4F). When the time between baseline visit and ESS was less than 100 days, the risk score was about .15. When the time increases to > 500 days, the risk score decreases to < .13. The PDP curve for age was non-monotonic and the risk scores varied from .1 for patients with age from 10-30 years, to about .17 for patients with age from 60-70 years (Figure S4E). The risk scores were .13- .15 for patients with age from 30-60 or over 70 years. The number of visits between baseline visit and baseline ESS was non-monotonic. The patients with 10-20 visits between the baseline visit and baseline ESS had smaller risk for revision ESS than the patients with less than 10 or more than 20 visits (Figure S4I).