Comparison of thresholds
The ROC was used to determine the best threshold for each dataset to classify the variants as splice altering or non-splice altering. Table 2 shows the comparison of the thresholds identified with the ROC curve with the predefined threshold for the different tools suggested by the developers. The best threshold to maximize the number of true positives and true negatives depended highly on the dataset. For MaxEntScan, SpliceSiteFinder-like and NNSplice the best thresholds were higher than the predefined threshold, whereas the best thresholds for SPIDEX and SpliceAI were lower. DSSP thresholds observed for NCSS variants were lower than the predefined threshold whilst the best threshold for DI variants was higher. The thresholds for S-CAP and CADD were difficult to compare to the predefined thresholds as these utilize a threshold depending on the location of the variant. For CADD, the threshold forABCA4 DI variants was lower than the suggested threshold. Two tools, SpliceAI and SpliceSiteFinder-like, showed thresholds close to the predefined threshold.