Predictive Models for Surgical Site Infection (SSI) in Patients with a Permanent Pacemaker (PPM) Using Machine learning Methods

Jiyoun Song; Elioth Sanabria-Buenaventura; Bevin Cohen; Jianfang Liu; David Yao; Elaine Larson

doi:10.22541/au.159188536.65812462

loading page

Predictive Models for Surgical Site Infection (SSI) in Patients with a Permanent Pacemaker (PPM) Using Machine learning Methods

Jiyoun Song,
Elioth Sanabria-Buenaventura,
Bevin Cohen,
Jianfang Liu,
David Yao,
Elaine Larson

Abstract

Introduction Given infections in patients with PPM are responsible for adverse outcomes such as an increased rate of mortality, one important reduction strategy of the incidence of SSIs is to identify and predict patients at high risk. Methods A retrospective cohort study was conducted in patients with PPM discharged from a large academic health center in New York City from 2006 through 2016. Risk factors identified through bivariate analysis were used to build predictive models. Five-fold cross-validation was applied to build models. The performance of the three machine learning models–logistic regression, decision tree (DT), and support vector machine (SVM)– for predicting surgical site infection (SSI) in patients with a permanent pacemaker (PPM) was compared. Results A total 205/9,274 (2.16%) patients with PPMs were diagnosed with a hospital-acquired SSI. Overall, the logistic regression algorithm had the highest prediction ability with the largest AUC at 72.9%. But the SVM model showed the highest sensitivity at 43.8% and positive predictive value at 32.5%. All three models showed excellent specificity and accuracy (over 98% and 96%, respectively). Conclusion Despite that this study showed the comparison of three predictive models, it has very limited clinical implications because of the low predictability of models (i.e., low PPV). Therefore, future researchers may improve the model by incorporating text data from clinical notes through natural language processing. Each algorithm had strengths and weaknesses in terms of accurate prediction, and interpretable clinical decision support. However, logistic regression was more accurate for predicting low-prevalence diseases such as SSI.