Distribution of the variants across databases
Table 2 shows the number of unique MECP2 variations for each investigated database. The relative number of unique MECP2 variations in each of the databases differ. The number of how many variations in one database are unique can also give an indication whether it is a database focusing on collecting pathogenic variations (RettBase, ClinVar, Maastricht Rett dataset, DECIPHER) (exception KMD) or general population sequencing results (no disease annotation) (EVA, LOVD, ExAC) (exception EVS). LOVD for example lists all different variations and provides background information about the abundance of one variation in the variations’ information sheet. RettBase also gives the reference from where this specific entry is from. From Table 2 it also becomes clear that every database has unique MECP2 variations, which are found in no other database. The number of such unique variants differ between 3,329 (EVA) and 1 (EVS).
Figure 2 shows the size of MECP2 variation collections in the different databases, their shared and their unique variations. There are databases that focus on collections of genome and/or exome sequencing data of mostly healthy individuals (EVA, EVS, ExAC), curated collections of disease causing variants (LOVD, RettBase, ClinVar, Decipher), and hospital derived collections (KMD, Maastricht Rett dataset). The overlap or shared MECP2 variations between databases can be explained by the occurrence of this variation in multiple patients, data exchange between databases, or by recruitment from the same resources. For instance, ExAC and LOVD share 559 unique variants, LOVD and ClinVar 546, LOVD and RettBase 512, RettBase and ClinVar 504.