Distribution of the variants across databases
Table 2 shows the number of unique MECP2 variations for each
investigated database. The relative number of unique MECP2 variations in
each of the databases differ. The number of how many variations in one
database are unique can also give an indication whether it is a database
focusing on collecting pathogenic variations (RettBase, ClinVar,
Maastricht Rett dataset, DECIPHER) (exception KMD) or general population
sequencing results (no disease annotation) (EVA, LOVD, ExAC) (exception
EVS). LOVD for example lists all different variations and provides
background information about the abundance of one variation in the
variations’ information sheet. RettBase also gives the reference from
where this specific entry is from. From Table 2 it also becomes clear
that every database has unique MECP2 variations, which are found in no
other database. The number of such unique variants differ between 3,329
(EVA) and 1 (EVS).
Figure 2 shows the size of MECP2 variation collections in the
different databases, their shared and their unique variations. There are
databases that focus on collections of genome and/or exome sequencing
data of mostly healthy individuals (EVA, EVS, ExAC), curated collections
of disease causing variants (LOVD, RettBase, ClinVar, Decipher), and
hospital derived collections (KMD, Maastricht Rett dataset). The overlap
or shared MECP2 variations between databases can be explained by the
occurrence of this variation in multiple patients, data exchange between
databases, or by recruitment from the same resources. For instance, ExAC
and LOVD share 559 unique variants, LOVD and ClinVar 546, LOVD and
RettBase 512, RettBase and ClinVar 504.