Statistical Analysis
The Derivation Data Set was randomly divided in equal halves. One half (50%) was used for variable selection and estimation of parameters of the prediction model (train) and the other half (50%) was used for internal validation (test). The Omicron Data Set was used for external validation. Descriptive statistics included frequency tables for categorical variables. Patient characteristics were compared between the subsamples (train vs. test and train vs. Omicron) using the Chi-square test.
Given the large sample size (n train = 120,536 and n test = 120,535), we developed the multivariate logistic regression models (1.- Hospital admission; 2.- Death; and 3.- Adverse evolution) using Lasso logistic regression which employs penalized likelihood for parameter estimates and variable selection in the train subsample. In the final models, only factors with p<0.01 were retained. Odds ratios (ORs) and 95% confidence intervals (CIs) were estimated. The discrimination ability of the model was measured by the area under the ROC curve (AUC).
To develop the predictive risk scores for each of the outcomes, we first assigned a weight to each risk predictor variable in relation to the estimated β parameters based on the lasso logistic regression model derived in the train subsample. We then added up the risk weights of all the patient’s predictor variables, with higher scores indicating a greater likelihood of event. The predictive accuracy of the risk score was assessed using the AUC in train, test and Omicron samples. Based on the risk score, we categorized the score into four different levels of risk. The optimal thresholds in the continuous risk scores were determined with the catpredi function of the R package CatPredi, using the addfor algorithm which maximizes the AUC for the categorized score. The performance of the risk classification was evaluated by means of the AUC and by studying the probability of event occurrence in each of the risk categories. In addition, the true positive rate (TPR), true negative rate (TNR) and the net benefit (NB), which considers the relative benefits and harms, were computed for each of the risk cut-off points. The model, score and categorized score were all validated in the Omicron sample by means of the AUC. All effects were considered significant at p<0.01. All statistical analyses were performed using R© version 4.1.2.