Introduction
Since 2009, whole exome sequencing (WES) and whole genome sequencing
(WGS) have become the main tools for the discovery of novel disease
genes and variants related to rare Mendelian phenotypes (Chong et al.
2015). Using this approach, progress has accelerated so that the number
of genes with known phenotype-causing variants has expanded from 2,346
in 2009 to 4,532 currently or ~22% of the total
protein-encoding genes in the genome (OMIM). That leaves nearly 80% of
the predicted ~20,000 protein-encoding genes yet to be
connected to a disease phenotype. Similarly, 50-75% of clinical and
research WES do not identify a responsible variant(s) even in families
that present Mendelian segregation of human disease traits (Chong et al.
2015; Posey et al. 2019; Retterer et al. 2016; Yang et al. 2014).
Possible explanations for the modest diagnostic rate include:
unappreciated phenotypic and genetic heterogeneity; causative variants
in not yet recognized disease genes (Liu et al. 2019); high locus
heterogeneity; complex molecular mechanisms underlying incomplete
penetrance; technical limitations in the applied sequencing approach;
and limitations in the variant analysis and classification. One
particular limitation, that we focus on here, is the lack of accurate
analytical tools to interpret and classify variants in known or novel
disease genes.
Variant classification in the research or clinical setting is a complex
process that takes into consideration many different features related to
the individual, the phenotype, the variant, the gene and the
environment. In 2015, Richards and colleagues (Richards et al. 2015),
published a guideline for variant interpretation and classification
based on criteria using typical types of variant evidence (e.g.
population data, computational data, functional data, segregation data,
etc.). To apply these criteria, research and clinical laboratories use
many different databases with different types of evidence, but very few
of them allow the laboratories to have access to detailed phenotypic
information related to the specific variants being investigated. Knowing
the phenotypic features of other individuals that carry the variant of
interest is a critical step in variant classification, but detailed
phenotypic information linked to putatively-causal variants is rarely
available in public or even controlled-access databases because of the
difficulty in obtaining detailed phenotype data, rarity of the candidate
variants, and challenges and uncertainty due to potential regulatory
requirements to maintain the confidentiality and privacy of individuals
who carry these rare variants.
Here we describe several databases that have made variant-level
information together with phenotype or phenotypic features available to
researchers, clinicians, health care providers and patients; and, their
plan to connect to each other following in the footsteps of the
Matchmaker Exchange project that connects gene-level databases (Sobreira
et al. 2017).