Map of BRCA1 putative SREs is used here to assess bioinformatic predictor performance
To extend the comparisons described above, we have evaluated the performance of ΔtESRseq, ΔHZEI, and HOT-SKIP (same approach as EX-SKIP, examines all possible exonic substitutions simultaneously) (Table 4, Supplementary Table 3). We used the 33BRCA1 variants identified as located in putative SREs (Table 2) as positive controls, and 250 non-spliceogenic variants as negative controls, as selected from all exons included in the assays of Findlay et al. (2018) (Figure 2). HSF was used in the selection of positive control variants as it incorporates several different algorithms thus capturing a more comprehensive set of SRE sequences; by design HSF could thus not be used to assess sensitivity in a comparative analysis but it tested a large proportion of negative control variants as false positives (27% specificity). Previous studies used ΔHZEI arbitrary thresholds of -20 (Soukarieh et al., 2016) and -0.5 (Grodecká et al., 2017) and ΔtESRseq cut-off of -0.5 (Grodecká et al., 2017; Soukarieh et al., 2016). For our analysis, ΔtESRseq and ΔHZEI cut-off scores were adjusted based on serial Matthews Correlation Coefficient calculations to obtain optimal predictive values: we set -0.75 for ΔtESRseq and -5 for ΔHZEI as the cut-off scores. We set the HOT-SKIP threshold (alt/wt > 1) based on the EX-SKIP cut-off score used by Grodecká et al. (2017). Results comparing tool performance are shown in Table 4. ΔHZEI had the best performance with 76% sensitivity and 82% specificity, followed by ΔtESRseq with 73% sensitivity and 80% specificity. HOT-SKIP had the lowest sensitivity (45%) and specificity (78%). Further, as a secondary analysis, false positive variants located in exons with no mapped SRE (see Table 3) were designated as true negatives (i.e. they were not predicted to impact an SRE). This markedly improved the specificity of all three tools (Table 4).