Machine learning algorithms
Univariate logistic regression models were used to study the prediction
accuracy of individual variables. Predictive performance was compared
between three classifiers: random forest, logistic regression and
extreme gradient boosting. The effect of variable collection time on
predictive performance was studied by using logistic regression
classifier. Shapley values (SHAP)17 were used to rank
the important variables for the trained classifier. Partially dependence
plots (PDP)18 were used to explore how the predictions
of the trained classifier partially depend on the values of variables
(Please see further methods in this article’s supporting information).