Statistical analysis
R version 3.3.1 statistical software was used for statistical analysis (R Core Team (2016) Vienna, Austria). Continuous variables are presented as medians [interquartile range] and categorical variables as numbers (%). Comparisons of quantitative data were performed using Wilcoxon-Mann-Whitney tests. Categorical variables were analyzed using the chi-square test or Fisher exact test as appropriate. The number of positive biological sources by age was evaluated using a quasi-Poisson regression to account for over-dispersion issues. No imputation of missing data was performed. Heatmaps were used to visualize the data using graphical representation as a grid of colors (according to the level of c-sIgE ISU), with rows standing for individuals and columns standing for components. The heatmaps were stratified according to severity group and individuals were ordered by age.
Both unsupervised and supervised analyses were performed to assess underlying data correlations. Components with a positive response (≥0.3 ISU) for at least three subjects and participants with at least one c-sIgE ≥0.3 ISU were retained for these analyses (Sup Fig.1). Principal component analyses (PCAs) were performed within the R function “prcomp”. Biplots of the principal components derived from the PCAs were plotted based on the classification of severe/non-severe disease. Then, random forest analyses using the known severity class of the patients were performed. Receiver operating characteristic (ROC) curves were used to assess the performance of the model using all c-sIgE to perform the classification and appraise the model predictions . The area under the curve (AUC) values indicated the level of precision: excellent for an AUC between 0.90 and 1.00; good for an AUC between 0.80 and 0.90; fair for an AUC between 0.70 and 0.80; poor for an AUC between 0.60 and 0.70, and fail for an AUC between 0.50 and 0.60. The prediction errors of the random forest analyses were also assessed by calculating the out-of-bag (OOB) errors. Furthermore, an unsupervised clustering approach was applied to identify patterns of c-sIgE sensitization among participants. Sensitization clusters were derived by clustering participants using Bayesian estimations of a mixture of Bernoulli distributions (Bernoulli Mixture Model), as previously described in detail (15). The BayesBinMix R package (15) was used to join estimation of the number of clusters and model parameters of the Bernoulli mixture model using Markov chain Monte Carlo sampling. A Poisson prior distribution was applied for the number of clusters and a uniform distribution for the Bernoulli parameters.