Prediction models:
a) ML Tool XGBoost: FVC(%), neutrophil(%), and
FVC25-75(%) were the top three predictors, respectively
based on ‘gain’ function(Table 3 ). MAPE for the model was
1.81%, indicating excellent performance.
b) Linear mixed-effects regression analyses: Hydroxyurea, FVC(%),
neutrophil(%), and FVC25-75(%) were statistically
significant and the top three predictors for adjusted DLCO
(Table 2 ). The rest of the predictors analyzed, including
FEV1/FVC, R5(%), and TLC(%), were not statistically significant. The
regression model reproduced the exact rank list of six predictors as the
XGBoost model (Table 3 ). MAPE between measured and eDLCO for
the mixed-model was 9.1%, suggesting that XGBoost had superior
prediction performance compared to the regression model (Figure
1 ).
Measured and estimated DLCO vs. outcome measures: Measured DLCO
was significantly associated with the number of lifetime VOC/ACS events
and TRJV (Table 4 ), but not with nocturnal hypoxemia (p=0.13).
After adjusting for age and sex, each 1% decrease in DLCO was
associated with 0.075 more lifetime ACS/VOC events (95%CI:-0.120 to
-0.030) and 0.009 m/s higher TRJV (95%CI:-0.017 to -0.001). eDLCO,
obtained from our predictive models, was also significantly associated
with AOC/VOC events and TRJV (Table 4 ):
after adjusting for age and sex,
each 1% decrease in eDLCO was associated with 0.084-0.102 more lifetime
ACS/VOC events (CI:-0.134 to -0.033 for the XGBoost model, and CI:-0.170
to -0.034 for the regression model) and with 0.009-0.014 m/s higher TRJV
(CI:-0.017 to -0.001 for XGBoost, and CI:-0.025 to -0.003 for the
regression model) (Table 4). Overall, results for modeled eDLCO
were very close to those obtained with measured DLCO.
Validation of the prediction model: We tested the strength of
the prediction model using LOOP method. Estimated DLCO (mean ± SD) was
87.9 ± 17.18 compared to measured DLCO of 87.79 ± 10.87, with good
forecasting (MAPE of 17.3%) and significant correlation (r=0.40,
p<0.001*) between two groups (figure 2).