Abstract:
The development of computational methods to assess pathogenicity of
pre-messenger RNA splicing variants is critical for diagnosis of human
disease. We assessed the capability of eight algorithms, and a consensus
approach, to prioritize 250 variants of uncertain significance (VUS)
that underwent splicing functional analyses. It is the capability of
algorithms to differentiate VUSs away from the immediate splice site as
‘pathogenic’ or ‘benign’ that is likely to have the most substantial
impact on diagnostic testing. We show that SpliceAI is the best single
strategy in this regard, but that combined usage of tools using a
weighted approach can increase accuracy further. We incorporated
prioritization strategies alongside diagnostic testing for rare
disorders. We show that 15% of 2783 referred individuals carry rare
variants expected to impact splicing that were not initially identified
as ‘pathogenic’ or ‘likely pathogenic’; 1 in 5 of these cases could lead
to new or refined diagnoses.
Keywords: splicing; rare disease; RNA; Mendelian disorders;
variant interpretation
A number of computational tools have been developed to assist in the
interpretation of genomic variation impacting splicing (Rowlands,
Baralle, & Ellingford, 2019). These tools have been expanded recently
to include an array of machine learning tools that have been trained to
prioritize splice-disrupting variation through diverse means (Cheng et
al., 2019; Jagadeesh et al., 2019; Jaganathan et al., 2019; Lee et al.,
2017; Xiong et al., 2015). Here we compare the accuracy of nine in
silico strategies, including eight state-of-the art algorithms and a
consensus approach, to prioritize variants impacting splicing.
First, we ascertained and performed functional analyses for 250 VUSs to
observe their impact on splicing (Table S1). To the best of our
knowledge, this is the largest set of VUSs that have been functionally
interrogated for impact on splicing as part of diagnostic services for
individuals with rare disease. Variants had been identified in
individuals undergoing genome sequencing and targeted gene panel
analysis, with diverse phenotypic presentations including familial
susceptibility to breast cancer (MIM #604370), syndromic disorders such
as Marfan syndrome (MIM #154700) and isolated inherited retinal
disorders such as retinitis pigmentosa (MIM #300029). The approaches
for VUS functional analysis are described elsewhere (Wai et al., 2020)
and in the Supporting Information. We observed that 80/250 (32%) of the
VUSs significantly impacted splicing, and as a result could be
reclassified as ‘likely pathogenic’ according to ACMG guidelines for
variant interpretation (Richards et al., 2015). This reclassification
resulted in new molecular diagnoses for individuals carrying these
variants. All VUSs impacted regions outside of canonical splice acceptor
and donor sites, and included examples of deeply intronic cryptic splice
sites, exonic cryptic splice sites and branchpoint variants. In some
cases, functional investigations demonstrated a range of consequences on
mRNA splicing (Figure 1), reinforcing the concept that the precise
effect of splicing variants is an important piece of evidence for
consideration during clinical variant interpretation that, in the
future, may enable refinements in appropriate targeted treatments
(Bauwens et al., 2019; Shen & Corey, 2018).
We obtained in silico prediction scores for each of the 250
functionally assessed variants using eight in silicoprioritization algorithms (Table S1) and calculated sensitivity,
specificity and receiver operating characteristic area under the curve
(AUC), observing significantly variable performances (Figure 2).
Pairwise statistical comparisons of AUC for the 250 functionally
assessed VUSs, after Bonferroni correction for multiple testing,
demonstrated that SpliceAI outperformed other single algorithm
approaches (Figure 2; Table S2). The AUC analysis for single algorithms
calculated the optimal score for each of the algorithms to distinguish
between true positives (80 variants shown to impact splicing in our
functional assays) and true negatives (170 variants shown not to impact
splicing in our functional assays) in this dataset. We acknowledge that
splicing machinery may be influenced by cell-/tissue-specific factors
which are outside the scope of assays performed here (Aicher, Jewell,
Vaquero-Garcia, Barash, & Bhoj, 2020; Cummings et al., 2020; Vig et
al., 2020), and variants may have pathogenic impacts on gene expression
and/or regulation without any detrimental impact on splicing (Castel et
al., 2018; Evans et al., 2018; Short et al., 2018; Zhang, Wakeling,
Ware, & Whiffin, 2020). Such factors will influence comparative metrics
between algorithms, and future investigations may uncover pathogenic
roles for variants reported here. However, the optimal thresholds
calculated in light of these limitations for the 250 functionally
assessed VUSs in this study are reported in Table S3.
Global approaches to variant analysis, as assessed through the AUC, may
fail to capture region-specific intricacies in splicing disruption
(Jagadeesh et al., 2019). For example, variants could be sub-divided by
their pathogenic mechanism, their effect on pre-mRNA splicing, their
predicted molecular consequence or the location of the variant with
respect to known splicing motifs, and each of these sub-groups may
require different approaches or thresholds for accurate prioritization
of pathogenic variation. We therefore predicted variants to be
‘disruptive’ or ‘undisruptive’ according to pre-defined thresholds,
utilizing region-specific thresholds where appropriate (Table S4), and
compared accuracy of each of the prioritization strategies across 2000
iterations of sampling with replacement. We utilized a single score
threshold for tools where region-specific thresholds have not been
previously identified (Table S4). This analysis highlighted differences
across the tools and significantly differentiated their ability to
accurately predict pathogenicity (Kruskal Wallis, df=8,
p<0.0001; Figure 2c-d). Similar to the AUC analysis, SpliceAI
(using a threshold of 0.2) was significantly the best performing
strategy across all assessed single algorithms for our set of analysed
VUSs (Kruskal Wallis, p<0.0001 for all pairwise comparisons of
accuracy between SpliceAI and other tools; Figure 2c-d).
To determine if combining one or more of these metrics could achieve
greater accuracy than prioritization scores in isolation, we developed a
consensus score for each variant which considered the region-specific
thresholds for each tool (Table S4, consensus score range = 0-8 tools
predicting splicing disruption). We observed that the consensus approach
performed similarly to SpliceAI when assessed through the receiver
operating characteristic AUC (Figure 2; Tables S2-S3). The consensus
approach (using a threshold of 4/8 algorithms supporting splicing
disruption) also performed more similarly to SpliceAI than other
strategies when measuring accuracy across sampling iterations (Figure
2c), but was less frequently the best performing approach (Figure 2d).
To understand if the relative scores from each algorithm could assist
interpretation we developed a novel metric which incorporates weighted
scores from the prioritization strategies (Supporting Information). This
analysis considered the actual score of the variant relative to the
maximum score possible from each prediction algorithm (Supporting
Information). Of note, the weighted approach considering scores from
SpliceAI and a consensus approach performs better than these two
approaches in isolation (Figure 2b; Table S3). Although not mutually
exclusive and underpowered to detect significant statistical differences
in the AUC from this combined analysis - due to marginal gains in
accuracy and sample size - this demonstrates the potential utility of
combined approaches utilizing combinations of scores to improve accuracy
for the identification of variants impacting splicing.
Next, we sought to examine the impact of these approaches on clinical
variant analysis. Therefore we integrated region-specific prioritization
strategies (Table S4) into an accredited diagnostic service for 2783
individuals with rare diseases (Ellingford et al., 2016). All
individuals included in this analysis have received genetic testing for
rare disease within the UK national healthcare service through a
clinically accredited laboratory. We calculated in silico scores
for 20,617 variants (18,013 of which were rare), observed a total of
1,346,744 times in the cohort. We observed substantial variability in
the number of rare variants prioritized by each in-silico tool
(Figure 3a; Table S5) and in the specific variants prioritized by the
most correlated in-silico splicing tools (Figure 3b). We observed
that while variants which show the highest consensus between in
silico splicing tools impact the canonical splice site (Figure 3c;
Table S6), 99% (n =17,871) of variants analysed impact exonic or
intronic regions of genes outside of the canonical splice sites.
Splicing variants are often considered as a single class of variants and
canonical splice site variants are therefore highly susceptible to
over-prioritization by in silico tools, as such variant
represents the majority (~70%) of known splicing
pathogenic variants (Krawczak et al., 2007; Stenson et al., 2014; Xiong
et al., 2015). Our data further underline the need to develop effective
and unbiased strategies for prioritizing variants impacting splicing
outside of the canonical splice sites, and this will be especially
important for VUS in known disease genes. Overall, these data
demonstrate that different in silico strategies for splicing
variant prioritization will alter the burden of variant analysis for
clinical scientists. This is an important consideration for the
analytical specificity and the throughput of diagnostic testing.
To assess the clinical impact of such strategies, we integrated a single
prioritization strategy, SpliceAI, in parallel to outcomes from routine
diagnostic testing. This analysis involved extensive curation of genomic
findings for the 2783 referred individuals, all of which were classified
in accordance with ACMG guidelines by clinically accredited scientists.
We added SpliceAI predictions alongside these analyses and observed that
this approach influenced analysis for 420 (15%) individuals receiving
genomic testing for rare disease, and could result in new or refined
molecular diagnoses in 81 (3%) cases. Overall, we prioritized 758
variants (528 unique variants) in 646 individuals (23% of cohort) with
a range of predicted molecular consequences. Most (99.6%, 526/528)
variants were prioritized by at least one other in silico tool
(Table S7). The strength of the score from SpliceAI correlated highly
with prioritization from other in silico tools (Figure 3d) and
differed between regions of genome that were impacted (Table S8). We
defined prioritized variants as:
- New , variant not previously highlighted or reported through
diagnostic testing;
- Clarified, variant previously reported through diagnostic
testing but pathogenicity or pathogenic mechanism was unclear;
- Reported , variant already described or established as
‘pathogenic’ or ‘likely pathogenic’ through diagnostic testing.
In this regard, we identified 379 new variants in 337
individuals, 87 clarified variants in 83 individuals and 292reported variants in 274 individuals. We found most (91%,
697/758) variants to be in genes known as a recessive cause of genetic
disease. To understand if these variants impacted normal splicing, we
interrogated the GTEx datasets (GTEx, 2013) for individuals carrying
variants in a heterozygous state, identifying 40 carriers of variants
prioritized by this analysis. Of these, 21 had suitable RNAseq datasets
available for evaluation, and we were able to clearly observe
significant alterations to splicing in four cases (Table 1). Whilst most
variants will require bespoke functional investigations to establish
precise effects on splicing and protein synthesis, leveraging the use of
publicly available datasets for individuals carrying potentially
pathogenic rare variants in the GTEx dataset can quickly increase
certainty of variant impact and refine clinical variant analysis.
The incorporation of the prioritization and functional strategies
described in this study for variants impacting splicing significantly
improved molecular diagnostic services. However, we expect that the true
impact of such analysis strategies will be more profound. Targeted next
generation sequencing approaches employed within this large cohort
ignore deeply intronic regions of genes, which, as shown here (Box 1,
Case Example) and in other studies (den Hollander et al., 2006;
Montalban et al., 2019; Sangermano et al., 2019), can harbor variants
which result in aberrant splicing through the production of novel
cryptic exons. The recent availability of genomic datasets within
healthcare amplifies the current limitations in interpreting variation
within the non-coding genome, particularly in large genome sequencing
cohorts. Our findings demonstrate the opportunity to expand
bioinformatics analysis to the pre-mRNA regions of known disease genes
and provide immediate increases to diagnostic yield. Moreover, we
demonstrate a requirement to functionally assess variant impact on
pre-mRNA splicing as the delineation of the precise effects may be
important in considerations for variant pathogenicity. The
prioritization and identification of pathogenic variants impacting
splicing is therefore an important consideration for diagnostic services
and for the development of new targeted treatments.