Classification metrics and Receiver Operator Curve
The accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) and Matthews correlation coefficient (MCC) were calculated for each dataset. The formulas of the applied classification metrics are provided in Supplementary Table 2. In addition to the standard statistical measures, MCC was used. The MCC is best suited for unbalanced datasets while other metrics are influenced by the size of the positive and negative group. A consensus of the Alamut tools (GeneSplicer, MaxEntScan, NNSPLICE and SpliceSiteFinder-like) is frequently considered in diagnostics. Therefore, an Alamut consensus with 3/4 tools was included in the assessment. Sklearn 0.19.2 for python was used to calculate the area under the curve (AUC) and the optimal cutoff to separate the true positives and true negatives for each prediction tool.