Bioinformatic prediction of BP site abrogation
BP prediction tools (Table 1) have demonstrated poor specificity due to
BP motif degeneracy combined with a lack of experimental data to train
algorithms (Corvelo, Hallegger, Smith, & Eyras, 2010). BP
characterization has lagged far behind that of 5’ and 3’ splice sites
because of experimental difficulties in detecting BPs (Paggi &
Bejerano, 2018). A large genome-wide dataset of experimentally confirmed
BPs (Mercer et al., 2015) has been used to develop the BP prediction
tools Branchpointer and LaBranchoR. Based on the Mercer dataset (Mercer
et al., 2015), only ~18% of human 3’ splice sites have
high confidence experimental BP annotations (Mercer et al., 2015; Paggi
& Bejerano, 2018).
The Branchpointer BP annotations were used to attribute hundreds of
clinically associated variants with changes in BP architecture, but the
impact of these variants on splicing was largely uncharacterized (Signal
et al., 2018). Other tools (SVM-BPfinder, BPP, RNABPS) are also
available but, similar to Branchpointer and LaBranchoR, these are mainly
for predicting the presence of a BP site. Namely, these tools were not
designed to automatically identify spliceogenic variants, and require
separate input of wild-type and variant intronic sequences for
non-automated comparison of scores. Branchpointer also allows input of
single nucleotide variants using rsIDs to evaluate separately the effect
of reference and alternative variants on BPs (Signal et al., 2018). The
use of R by Branchpointer, and python scripts by LaBranchoR and BPP,
have also rendered these tools less accessible to non-bioinformatician
users (Leman et al., 2020). HSF, an older and easy-to-use online
splicing tool, can directly analyze an intronic variant to predict BP
site abrogation; however, recent evaluations have revealed its poor
performance in detecting experimentally verified BPs (Leman et al.,
2020; Signal et al., 2018; Q. Zhang et al., 2017).
It is important to note that variants predicted to disrupt a BP do not
necessarily induce aberrant splicing, as introns can have multiple
functional BPs (Mercer et al., 2015), which adds to the complexity of
predicting the spliceogenicity of a single variant in the BP window.
Moreover, in the analysis of Leman et al. (2020), the use of score
change to predict BP disruption by a variant was found to not be
the best strategy to predict spliceogenic variants. According to Leman
et al. (2020), the best approach would be to consider a variant as
potentially spliceogenic if it is located in the BP motif regardless of
score change. Performance of BPP, Branchpointer, HSF, LaBranchoR,
RNABPS, and SVM-BPfinder was evaluated by checking the co-location of
confirmed spliceogenic variants within predicted BP motifs, and revealed
BPP as having the highest accuracy of 89.17% (Leman et al., 2020). In
their positive control set of 38 spliceogenic variants, 32 variants were
within BP motifs predicted by BPP, which predicted a total of 39 BP
motifs (Leman et al., 2020).
Generally, the current BP prediction tools are useful in prioritizing
candidate spliceogenic variants for downstream analysis through
predicting their location in putative BP sites. Further, while variants
reported to alter a BP site sequence generally lead to exon skipping,
other types of splicing aberrations have been observed (Crotti et al.,
2009; M. Li & Pritchard, 2000). Hence, the current BP prediction tools
are not suitable for predicting a specific splicing effect.