Materials and Methods
Workflow of genetic variant data
integration
Data selection and
retrieval
In a recent study (Townend et al., 2018), we identified 13
genotype-phenotype databases containing RTT-specific MECP2variation data. We evaluated each of these for specific requirements for
data integration. Data should be 1) available and permitted to be
re-used and redistributed, 2) the given description of genetic variants
should be for an unambiguous variation. The latter means that the exact
position (chromosome build and location) as well as the variation of the
genetic variants are available or retrievable by conversion, thus, they
can be described using the HGVS nomenclature. For this study, we
selected eight databases and downloaded all MECP2 genetic
variants with available linked phenotype information from each of these
databases: ClinVar (Landrum et al., 2016),https://www.ncbi.nlm.nih.gov/clinvar/,
DECIPHER (Firth et al., 2009),https://decipher.sanger.ac.uk/,
EVA (http://www.ebi.ac.uk), EVS
(http://evs.gs.washington.edu),
ExAC (Lek et al., 2016),http://exac.broadinstitute.org/,
KMD (https://kmd.nih.go.kr), LOVD
(Fokkema et al., 2011), MECP2 collection:https://databases.lovd.nl/shared/genes/MECP2),
and RettBASE (Krishnaraj, Ho, & Christodoulou, 2017),http://mecp2.chw.edu.au/.
Additionally, an anonymized dataset from local RTT patients were
included (Maastricht Rett dataset, permission granted by Niet-WMO
verklaring 2018-0597, Maastricht University METC approval). Either the
integrated download function was used to get the data or data was
extracted from HTML (see the availability of download functions in
(Townend et al., 2018). Figure 1 shows the data processing (step 1-3)
and analysis (step 4) workflow of this study.