Statistical analysis and predictive model set up
Radiomics features can be strongly correlated with each other, so that
clusters of similarly distributed ones could be identified. The
reduction of features’ space dimensionality allows to reduce the
probability of over-fitting the data. Radiomics features were normalized
according to Z-score and feature selection was performed via Boruta
algorithm with 3-fold cross validation repeated 5 times. . Features,
which were Boruta-selected at least 7 times out of 15, were kept for the
next analysis phase. These selected features were then tested for
reciprocal pairwise correlation with Pearson correlation test, and only
those with a correlation value below 0.8 were kept as the final set of
features. Finally, logistic regression models were trained with the
selected features in 3-fold cross-validation repeated 5 times with the
same dataset splits used in the feature selection step, with the aim to
predict MYCN amplification status. Area Under the Curve (AUC) values and
classification matrix statistics at prediction 0.3 cut-off (positive
case prevalence) were averaged among the cross-validation results to
obtain an estimate of the model out-of-sample performance. Furthermore,
the identified radiomics features were also pooled with patients follow
up information, in order to verify their OS prediction capability.