Abstract:
The development of computational methods to assess pathogenicity of pre-messenger RNA splicing variants is critical for diagnosis of human disease. We assessed the capability of eight algorithms, and a consensus approach, to prioritize 250 variants of uncertain significance (VUS) that underwent splicing functional analyses. It is the capability of algorithms to differentiate VUSs away from the immediate splice site as ‘pathogenic’ or ‘benign’ that is likely to have the most substantial impact on diagnostic testing. We show that SpliceAI is the best single strategy in this regard, but that combined usage of tools using a weighted approach can increase accuracy further. We incorporated prioritization strategies alongside diagnostic testing for rare disorders. We show that 15% of 2783 referred individuals carry rare variants expected to impact splicing that were not initially identified as ‘pathogenic’ or ‘likely pathogenic’; 1 in 5 of these cases could lead to new or refined diagnoses.
Keywords: splicing; rare disease; RNA; Mendelian disorders; variant interpretation
A number of computational tools have been developed to assist in the interpretation of genomic variation impacting splicing (Rowlands, Baralle, & Ellingford, 2019). These tools have been expanded recently to include an array of machine learning tools that have been trained to prioritize splice-disrupting variation through diverse means (Cheng et al., 2019; Jagadeesh et al., 2019; Jaganathan et al., 2019; Lee et al., 2017; Xiong et al., 2015). Here we compare the accuracy of nine in silico strategies, including eight state-of-the art algorithms and a consensus approach, to prioritize variants impacting splicing.
First, we ascertained and performed functional analyses for 250 VUSs to observe their impact on splicing (Table S1). To the best of our knowledge, this is the largest set of VUSs that have been functionally interrogated for impact on splicing as part of diagnostic services for individuals with rare disease. Variants had been identified in individuals undergoing genome sequencing and targeted gene panel analysis, with diverse phenotypic presentations including familial susceptibility to breast cancer (MIM #604370), syndromic disorders such as Marfan syndrome (MIM #154700) and isolated inherited retinal disorders such as retinitis pigmentosa (MIM #300029). The approaches for VUS functional analysis are described elsewhere (Wai et al., 2020) and in the Supporting Information. We observed that 80/250 (32%) of the VUSs significantly impacted splicing, and as a result could be reclassified as ‘likely pathogenic’ according to ACMG guidelines for variant interpretation (Richards et al., 2015). This reclassification resulted in new molecular diagnoses for individuals carrying these variants. All VUSs impacted regions outside of canonical splice acceptor and donor sites, and included examples of deeply intronic cryptic splice sites, exonic cryptic splice sites and branchpoint variants. In some cases, functional investigations demonstrated a range of consequences on mRNA splicing (Figure 1), reinforcing the concept that the precise effect of splicing variants is an important piece of evidence for consideration during clinical variant interpretation that, in the future, may enable refinements in appropriate targeted treatments (Bauwens et al., 2019; Shen & Corey, 2018).
We obtained in silico prediction scores for each of the 250 functionally assessed variants using eight in silicoprioritization algorithms (Table S1) and calculated sensitivity, specificity and receiver operating characteristic area under the curve (AUC), observing significantly variable performances (Figure 2). Pairwise statistical comparisons of AUC for the 250 functionally assessed VUSs, after Bonferroni correction for multiple testing, demonstrated that SpliceAI outperformed other single algorithm approaches (Figure 2; Table S2). The AUC analysis for single algorithms calculated the optimal score for each of the algorithms to distinguish between true positives (80 variants shown to impact splicing in our functional assays) and true negatives (170 variants shown not to impact splicing in our functional assays) in this dataset. We acknowledge that splicing machinery may be influenced by cell-/tissue-specific factors which are outside the scope of assays performed here (Aicher, Jewell, Vaquero-Garcia, Barash, & Bhoj, 2020; Cummings et al., 2020; Vig et al., 2020), and variants may have pathogenic impacts on gene expression and/or regulation without any detrimental impact on splicing (Castel et al., 2018; Evans et al., 2018; Short et al., 2018; Zhang, Wakeling, Ware, & Whiffin, 2020). Such factors will influence comparative metrics between algorithms, and future investigations may uncover pathogenic roles for variants reported here. However, the optimal thresholds calculated in light of these limitations for the 250 functionally assessed VUSs in this study are reported in Table S3.
Global approaches to variant analysis, as assessed through the AUC, may fail to capture region-specific intricacies in splicing disruption (Jagadeesh et al., 2019). For example, variants could be sub-divided by their pathogenic mechanism, their effect on pre-mRNA splicing, their predicted molecular consequence or the location of the variant with respect to known splicing motifs, and each of these sub-groups may require different approaches or thresholds for accurate prioritization of pathogenic variation. We therefore predicted variants to be ‘disruptive’ or ‘undisruptive’ according to pre-defined thresholds, utilizing region-specific thresholds where appropriate (Table S4), and compared accuracy of each of the prioritization strategies across 2000 iterations of sampling with replacement. We utilized a single score threshold for tools where region-specific thresholds have not been previously identified (Table S4). This analysis highlighted differences across the tools and significantly differentiated their ability to accurately predict pathogenicity (Kruskal Wallis, df=8, p<0.0001; Figure 2c-d). Similar to the AUC analysis, SpliceAI (using a threshold of 0.2) was significantly the best performing strategy across all assessed single algorithms for our set of analysed VUSs (Kruskal Wallis, p<0.0001 for all pairwise comparisons of accuracy between SpliceAI and other tools; Figure 2c-d).
To determine if combining one or more of these metrics could achieve greater accuracy than prioritization scores in isolation, we developed a consensus score for each variant which considered the region-specific thresholds for each tool (Table S4, consensus score range = 0-8 tools predicting splicing disruption). We observed that the consensus approach performed similarly to SpliceAI when assessed through the receiver operating characteristic AUC (Figure 2; Tables S2-S3). The consensus approach (using a threshold of 4/8 algorithms supporting splicing disruption) also performed more similarly to SpliceAI than other strategies when measuring accuracy across sampling iterations (Figure 2c), but was less frequently the best performing approach (Figure 2d). To understand if the relative scores from each algorithm could assist interpretation we developed a novel metric which incorporates weighted scores from the prioritization strategies (Supporting Information). This analysis considered the actual score of the variant relative to the maximum score possible from each prediction algorithm (Supporting Information). Of note, the weighted approach considering scores from SpliceAI and a consensus approach performs better than these two approaches in isolation (Figure 2b; Table S3). Although not mutually exclusive and underpowered to detect significant statistical differences in the AUC from this combined analysis - due to marginal gains in accuracy and sample size - this demonstrates the potential utility of combined approaches utilizing combinations of scores to improve accuracy for the identification of variants impacting splicing.
Next, we sought to examine the impact of these approaches on clinical variant analysis. Therefore we integrated region-specific prioritization strategies (Table S4) into an accredited diagnostic service for 2783 individuals with rare diseases (Ellingford et al., 2016). All individuals included in this analysis have received genetic testing for rare disease within the UK national healthcare service through a clinically accredited laboratory. We calculated in silico scores for 20,617 variants (18,013 of which were rare), observed a total of 1,346,744 times in the cohort. We observed substantial variability in the number of rare variants prioritized by each in-silico tool (Figure 3a; Table S5) and in the specific variants prioritized by the most correlated in-silico splicing tools (Figure 3b). We observed that while variants which show the highest consensus between in silico splicing tools impact the canonical splice site (Figure 3c; Table S6), 99% (n =17,871) of variants analysed impact exonic or intronic regions of genes outside of the canonical splice sites. Splicing variants are often considered as a single class of variants and canonical splice site variants are therefore highly susceptible to over-prioritization by in silico tools, as such variant represents the majority (~70%) of known splicing pathogenic variants (Krawczak et al., 2007; Stenson et al., 2014; Xiong et al., 2015). Our data further underline the need to develop effective and unbiased strategies for prioritizing variants impacting splicing outside of the canonical splice sites, and this will be especially important for VUS in known disease genes. Overall, these data demonstrate that different in silico strategies for splicing variant prioritization will alter the burden of variant analysis for clinical scientists. This is an important consideration for the analytical specificity and the throughput of diagnostic testing.
To assess the clinical impact of such strategies, we integrated a single prioritization strategy, SpliceAI, in parallel to outcomes from routine diagnostic testing. This analysis involved extensive curation of genomic findings for the 2783 referred individuals, all of which were classified in accordance with ACMG guidelines by clinically accredited scientists. We added SpliceAI predictions alongside these analyses and observed that this approach influenced analysis for 420 (15%) individuals receiving genomic testing for rare disease, and could result in new or refined molecular diagnoses in 81 (3%) cases. Overall, we prioritized 758 variants (528 unique variants) in 646 individuals (23% of cohort) with a range of predicted molecular consequences. Most (99.6%, 526/528) variants were prioritized by at least one other in silico tool (Table S7). The strength of the score from SpliceAI correlated highly with prioritization from other in silico tools (Figure 3d) and differed between regions of genome that were impacted (Table S8). We defined prioritized variants as:
In this regard, we identified 379 new variants in 337 individuals, 87 clarified variants in 83 individuals and 292reported variants in 274 individuals. We found most (91%, 697/758) variants to be in genes known as a recessive cause of genetic disease. To understand if these variants impacted normal splicing, we interrogated the GTEx datasets (GTEx, 2013) for individuals carrying variants in a heterozygous state, identifying 40 carriers of variants prioritized by this analysis. Of these, 21 had suitable RNAseq datasets available for evaluation, and we were able to clearly observe significant alterations to splicing in four cases (Table 1). Whilst most variants will require bespoke functional investigations to establish precise effects on splicing and protein synthesis, leveraging the use of publicly available datasets for individuals carrying potentially pathogenic rare variants in the GTEx dataset can quickly increase certainty of variant impact and refine clinical variant analysis.
The incorporation of the prioritization and functional strategies described in this study for variants impacting splicing significantly improved molecular diagnostic services. However, we expect that the true impact of such analysis strategies will be more profound. Targeted next generation sequencing approaches employed within this large cohort ignore deeply intronic regions of genes, which, as shown here (Box 1, Case Example) and in other studies (den Hollander et al., 2006; Montalban et al., 2019; Sangermano et al., 2019), can harbor variants which result in aberrant splicing through the production of novel cryptic exons. The recent availability of genomic datasets within healthcare amplifies the current limitations in interpreting variation within the non-coding genome, particularly in large genome sequencing cohorts. Our findings demonstrate the opportunity to expand bioinformatics analysis to the pre-mRNA regions of known disease genes and provide immediate increases to diagnostic yield. Moreover, we demonstrate a requirement to functionally assess variant impact on pre-mRNA splicing as the delineation of the precise effects may be important in considerations for variant pathogenicity. The prioritization and identification of pathogenic variants impacting splicing is therefore an important consideration for diagnostic services and for the development of new targeted treatments.