Best place for figure 2
Biological questions answered using this
data
Variant pathogenicity prediction vs. curated
datasets
To explore differences between RTT causing and benign MECP2genetic variants we analyzed the annotated results from VEP (see
Methods) from six descriptive features (Figure 3). We chose to visualize
the obtained scores about conservation (i.e., PolyPhen), pathogenicity
estimation scores (i.e., SIFT, CADD, MetaLR, FATHMM-MKL), and the
variant frequency in normal population from GnomAD (Lek et al., 2016)
(i.e., GnomAD_AF).
We split variants by benign, both and RTT causing, as we identified a
subset of 19 variants appearing in both datasets. Overall, we see
expected results: the RTT causing variants were found to be in positions
significantly more conserved than the benign or both variants (Figure 3,
PolyPhen (Wilcoxon test)), as well as less frequent than benign
variations even though, all variants presented here are not abundant in
the normal population (Figure 3, GnomAD_AF).
Analysis of the obtained estimation of pathogenicity from multiple
scores (Figure 3 panels SIFT, CADD, MetaLR and FATHMM-MKL), shows that
RTT causing variants are on average predicted as more damaging than the
benign and both variants (p < 0.0001 in all cases after
applying Wilcoxon test). Note that SIFT associates more pathogenic
variants to lower scores, whereas CADD, MetaLR and FATHMM-MKL associates
more pathogenic variants to higher scores. MetaLR is better than the
other three pathogenicity scores in distinguishing benign and RTT
causing variant types. This may be because this novel meta-score
integrates more features than the other three prediction tools, amongst
other pathogenicity scores and frequency information.
The characterization of the both group is located in three of five
predictions between the benign and RTT causing, and in two of five
closer to the benign group.