Statistical analysis
We performed an agglomerative hierarchical clustering with the Ward minimum-variance method to group comorbidity variables and identify aggregated conditions. We used the hclust function in R with the dissimilarity matrix defined by the Kendall distance, assuming variables were not parametric (Figure 2) [13]. The previously pre-specified dichotomous variables (COPD, dyslipidemia, liver disease, dementia, and stroke) were assigned a value of one when a given comorbidity was present and zero when it was absent. Categorical variables, such as stroke and arrhythmia, took their values depending on their respective categories. In the case of stroke, the following values were assigned: absent = 0, transitory ischemic accident = 1, hemorrhagic stroke = 2, cardioembolic stroke = 3 and atherothrombotic stroke = 4. In the case of arrhythmia, they were: sinus rhythm = 0, atrial fibrillation or flutter (AF/flutter) = 1, atrioventricular block = 2, and other = 3. Finally, the quantitative pre-specified variables (BMI, eGFR, LVEF, hemoglobin and SBP) retained their numerical value.
Bootstrap resampling techniques (n = 1000) were used to assess reproducibility for each hierarchical cluster, applying the pvclust function in R [14]. We computed the bootstrap probability (BP) value which corresponds to the frequency with which the cluster is identified in bootstrap copies, and the approximately unbiased (AU) probability values by multiscale bootstrap resampling (Figure 2). Clusters with AU ≥ 95% are considered to be strongly supported by data.
Once the clusters were built, we performed univariate comparisons between them. Quantitative variables were expressed as mean +/- standard deviation if normal, and median +/- interquartile range if not normal. The clusters were compared for various numeric parameters by one-way analysis of variance and by the post hoc Tukey’s test for multiple comparisons. If the variables were not normal, we used the Kruskal-Wallis test. Qualitative variables were expressed as absolute number and percentage. Study groups were compared using the Chi-squared test.
Finally, a Cox proportional-hazard model was used to examine the association between the clusters and time to hospitalization and death. The model covariates were selected a priori based on previous prognostic reports and clinical experience, and variables which were significant in the initial univariate comparisons were also included. Cumulative curves were estimated by the Kaplan-Meier method and compared by log-rank testing. A p value of < 0.05 was considered significant. Analyses were performed using the SPSS and R programs.
RESULTS
A total of 1,934 patients were analyzed: 907 had T2DM (39.1% men, mean age 78.4+/-7.6 years) and 1,027 did not (39.9% men, mean age 81.4+/- 7.6 years). The most prevalent comorbidities were dyslipidemia (52.4%), AF/flutter (67.4%), and COPD (24.9%). The similarity matrix and significance by variable in the clusters are shown in figure 2.