Future use of mapped SREs in BRCA1 to improve SRE prediction
Solving the problem of over-prediction is an important step towards the utility of SRE-dedicated bioinformatic tools in variant interpretation and clinical diagnostics. As shown in our detailed BRCA1 SRE map (Supplementary Figure 1), there are negative control variants within the mapped SREs that are predicted by HSF to alter these motifs. Some are even located at the same nucleotide position as positive control variants. For example, c.5007C>T, categorized as non-functional and with effects on mRNA depletion (as per Figure 2), is designated a true positive since it is also predicted to create an ESS by HSF (Supplementary Figure 1); whereas, c.5007C>A and c.5007C>G have no functional impact, and are designated false positives since they were predicted to create an ESS and break an ESE, respectively (Supplementary Figure 1). ΔtESRseq, ΔHZEI, and HOT-SKIP – which combine the scores of ESEs and ESSs disrupted or created by a variant – correctly predicted c.5007C>A and c.5007C>G to have no impact on an SRE. Similar results are observed for other co-located HSF-predicted false positive variants at c.5127, c.5130, c.5430, and c.5472 (Supplementary Figure 1), where at least two of the three tools (ΔtESRseq, ΔHZEI, and HOT-SKIP) had negative calls in agreement with mRNA depletion score results. While the quantitative combined ESS-ESE scoring approach of ΔtESRseq, ΔHZEI, and HOT-SKIP appears to significantly lower the number of HSF-predicted false positives, there are still negative control variants within the mapped SREs that are predicted as impacting SREs by these three tools. Clearly, there are other factors that need to be considered to improve prediction of variant effect with mapped SREs.
The false positive variants can be studied further to gain more understanding of the structural features that prevent the usage of SREs. For false positive variants outside of the mapped SREs, the location of predicted SREs with respect to local mRNA secondary structure could also play a role e.g. inclusion of SRE in the stem of a stem-loop structure may possibly lessen the access of a corresponding RNA-binding protein (Buratti et al., 2004). In the same way, the positive control dataset of 33 variants could be assessed for structural features that enable these variants to alter mRNA expression. More information on structural patterns that influence exonic SRE activity, which can be obtained from bioinformatic analysis, may be useful in improving SRE prediction not only in BRCA1 but also in other genes.