Performance assessment of the splice prediction tools
The accuracy, sensitivity, specificity, PPV, NPV and MCC for each dataset, as defined in Supplementary Table 2, are provided in Tables 3-5. For the ABCA4 NCSS variants the PPV was above 90% for all tools and the NPV was below 30% for all tools. This can be explained by the imbalance in the variants as the majority of the variants in this dataset affected splicing. Spidex, which was found in the list of the five tools with the highest AUC value, had a low accuracy, sensitivity and NPV compared to other tools. Moreover, it predicted 44 false negatives. The number of false negatives for the other tools is on average 23. The highest MCC, a measure that is optimal for unbalanced test data sets, was found for SpliceAI and DSSP. For ABCA4 DI variants the tools with the lowest performance assessment corresponded to the tools with the lowest AUC (SpliceSiteFinder-like, DSSP and CADD), and SpliceAI showed the best performance assessment. For theMYBPC3 NCSS dataset, the tools demonstrating a low performance were also the ones with a low AUC. DSSP showed a low performance on specificity with 46% and SPIDEX on accuracy (49%) and especially sensitivity (18%). Additionally, DSSP predicted 20 false positives (9 on average) and SPIDEX 28 false negatives (13 on average). The tool with the highest MCC was the Alamut 3/4 consensus approach followed by SpliceSiteFinder-like and GeneSplicer.