Statistical analysis and predictive model set up
Radiomics features can be strongly correlated with each other, so that clusters of similarly distributed ones could be identified. The reduction of features’ space dimensionality allows to reduce the probability of over-fitting the data. Radiomics features were normalized according to Z-score and feature selection was performed via Boruta algorithm with 3-fold cross validation repeated 5 times. . Features, which were Boruta-selected at least 7 times out of 15, were kept for the next analysis phase. These selected features were then tested for reciprocal pairwise correlation with Pearson correlation test, and only those with a correlation value below 0.8 were kept as the final set of features. Finally, logistic regression models were trained with the selected features in 3-fold cross-validation repeated 5 times with the same dataset splits used in the feature selection step, with the aim to predict MYCN amplification status. Area Under the Curve (AUC) values and classification matrix statistics at prediction 0.3 cut-off (positive case prevalence) were averaged among the cross-validation results to obtain an estimate of the model out-of-sample performance. Furthermore, the identified radiomics features were also pooled with patients follow up information, in order to verify their OS prediction capability.