Performance assessment of the splice prediction tools
The accuracy, sensitivity, specificity, PPV, NPV and MCC for each
dataset, as defined in Supplementary Table 2, are provided in Tables
3-5. For the ABCA4 NCSS variants the PPV was above 90% for all
tools and the NPV was below 30% for all tools. This can be explained by
the imbalance in the variants as the majority of the variants in this
dataset affected splicing. Spidex, which was found in the list of the
five tools with the highest AUC value, had a low accuracy, sensitivity
and NPV compared to other tools. Moreover, it predicted 44 false
negatives. The number of false negatives for the other tools is on
average 23. The highest MCC, a measure that is optimal for unbalanced
test data sets, was found for SpliceAI and DSSP. For ABCA4 DI
variants the tools with the lowest performance assessment corresponded
to the tools with the lowest AUC (SpliceSiteFinder-like, DSSP and CADD),
and SpliceAI showed the best performance assessment. For theMYBPC3 NCSS dataset, the tools demonstrating a low performance
were also the ones with a low AUC. DSSP showed a low performance on
specificity with 46% and SPIDEX on accuracy (49%) and especially
sensitivity (18%). Additionally, DSSP predicted 20 false positives (9
on average) and SPIDEX 28 false negatives (13 on average). The tool with
the highest MCC was the Alamut 3/4 consensus approach followed by
SpliceSiteFinder-like and GeneSplicer.