EBC-based asthma diagnostic model performance assessment
The 16S rDNA EV metagenomic profiles of the EBC samples were used as features for the various methods for the development of diagnostic models for asthma. Ten iterations of each modeling method confirmed that logistic methods using either t-test of LEfSe biomarker selection performed more poorly than any of the ML methods based on the area under the curve (AUC) values (Figure 3A ). The incorporation of LEfSe biomarkers as features for logistic models boosted the median AUC value of the t-test method from 0.749 to 0.760; however, t- test feature selection produced the higher average AUC value of the two methods (Table 2 ). While the ANN method demonstrated a higher average AUC value than that of either of the logistic models, GBM’s average AUC value of 0.832 was the highest among the five methods, including the combined GBM/ANN ensemble methodology, which yielded a slightly lower average AUC value of 0.826. The standard deviation between the 10 iterations of each method was relatively low, ranging from 0.029 to 0.050. Receiver operating characteristic (ROC) curve plots also depicted the AUC values of the 10 model iterations of each asthma model method based on the range of specificity and sensitivity values (Figure 3B ).