Discussion
Considering the substantial differences amongst diseases in terms of
inheritance pattern, disease mechanism, phenotype, genetic and allelic
heterogeneity, and prevalence, disease-specific guidelines are necessary
for accurate and reliable interpretations (Rehm, Berg, & Plon, 2018).
Following variant interpretation guidelines for genetic hearing loss
(Oza et al., 2018) , we developed a new computational tool, named
VIP-HL, publicly available through a web interface
(http://hearing.genetics.bgi.com/).
To our knowledge, this is the first tool designed for automated variant
interpretation in genetic hearing loss. Considering the high prevalence
of hearing loss in the population, the availability of VIP-HL will
significantly relieve the interpretation burdens for clinicians and
curators.
Compared to rules activated by ClinGen HL-EP, VIP-HL showed a markedly
high concordance (96%), indicating the reliability of interpreting
hearing loss variants via VIP-HL. Of note, all the three discrepant
activations (variants #1-3, Table 1) were attributable to
population-based rules (BA1 and PM2), which depends on the adoption of
popmax filtering allele frequency in extensive population studies
(Whiffin et al., 2017). The ClinGen HL-EP used the ExAC database in the
time of their research whereas we employed a larger dataset (gnomAD) as
it was encouraged by the ClinGen HL-EP (Oza et al., 2018). Using these
stringent allele frequencies empowers clinical genome interpretation
without the removal of true pathogenic variants (Whiffin et al., 2019).
VIP-HL activated several rules that were not activated by ClinGen HL-EP,
including PM1 and BS2. ClinGen HL-EP did not perform a systematic review
of mutational hot spots or functional domains for all genes associated
with hearing loss, and proposed that PM1 can be applied for KCNQ4pore-forming region (Oza et al., 2018). In this study, we used the
enrichment of pathogenic/likely pathogenic variants to construct a set
of important regions (Xiang, Peng, et al., 2020) which includes theKCNQ4 pore-forming region. Additionally, although HL-EP did not
elaborate on the cutoff for BS2, we used a conservative cutoff to
automate this rule. It should be noted that the penetrance affects the
application of BS2 but was not considered by VIP-HL. This led to
activations of BS2 for NM_004004.6:c.109G>A and
NM_004004.6:c.101T>C in the GJB2 gene because 50
and 16 homozygotes were identified from the gnomAD control dataset,
respectively. The two variants were well-known pathogenic variants with
low penetrance (Shen et al., 2019). Nevertheless, VIP-HL is a
semi-automatic tool and our user interface enables curators to manually
adjust codes to avoid such possible misclassifications.
A further comparison between VIP-HL and ClinVar showed an overall
interpretation concordance of 88.0%. In terms of pathogenic/likely
pathogenic variants, the concordance was lower (57.1%). This could be
explained that VIP-HL only automated 13 out of 24 ACMG/AMP rules. The
nine case-level and segregation evidence and two functional evidence
required manual curation from scientific literature. Of them, pathogenic
rules are more frequently activated than benign rules (Oza et al.,
2018). Prospectively, text-mining and machine learning techniques might
serve as potential solutions. For example, Birgmeier and co-authors
developed an end-to-end machine learning tool, named AVADA, for the
automatic retrieval of variant evidence directly from full-text
literature (Birgmeier et al., 2020). Suppose we can accumulate enormous
datasets of evidence-related sentences or figures, in that case, it is
possible to apply machine-learning approaches in the future for evidence
retrieval and to automate the remaining ACMG/AMP rules in the next
version of VIP-HL. In the meantime, our interface enables curators to
manually activate the relevant codes after manual literature curation.
VIP-HL generated three P/LP classifications versus B/LB compared to
ClinVar. All the three variants were related to the consideration of
splicing impact. This discrepancy of
NM_153676.3:c.2547-1G>T was attributable to a lack of
considerations of exon expression data, which ultimately led to
inappropriate classifications. It is apparent that a splicing variant
affecting a non-expressive exon should have less functional effects
(DiStefano et al., 2018). Recently, the transcript-level information
from the GTEx project (Consortium, 2017) was utilized and proved that
incorporating exon expression data can improve interpretations of
putative loss-of-function variants (Cummings et al., 2020). The second
and third variants (NM_206933.3:c.949C>A and
NM_022124.6:c.7362G>A) were synonymous variants, and their
splicing impact should be curated from public literature if available.
Nevertheless, these results indicated the importance of expression data
in variant interpretation.
To improve user experience and further facilitate variation
interpretation via VIP-HL, we developed a user-friendly web interface,
which we continue to grow and add useful features over time. For
example, PM3, one of the most frequently activated rules in genetic
hearing loss (Oza et al., 2018), relies on the variant’s pathogenicity
on the second allele. If this latter variant is introduced (in HGVS
nomenclature) during the curation of PM3, VIP-HL can now provide the
pathogenicity of this second variant as a reference for users. We expect
such features and ongoing improvements would save curators the time and
relieve the burden of variant interpretation.
VIP-HL has limitations. First, it is currently not applicable for
exon-level copy number variations. Second, the allele frequency cutoffs
were different for dominant and recessive hearing loss disorders. We
first applied the cutoffs from the inheritance curated by ClinGen HL-EP
for variants in a gene with both dominant and recessive inheritance. If
both were available, we conservatively chose the cutoffs in recessive
disorders. To avoid users falling into this pitfall, we highlighted the
selected inheritance in the web interface of VIP-HL.
In conclusion, VIP-HL is an integrated online tool and search engine for
variants in genetic hearing loss genes. It is also the first tool, to
our knowledge, to consider the specifications proposed by ClinGen HL-EP
for genetic hearing loss related variants. Providing reliable and
reproducible annotations, VIP-HL not only facilitates variant
interpretation but also provides a platform for users to share
classifications with others.
Data Availability Statement:Data sharing is not applicable to
this article as no new data were created or analyzed in this study.
Conflict of Interest: Jiguang Peng, Jiale Xiang, Xiangqian Jin,
Junhua Meng Lisha Chen, Nana Song, and Zhiyu Peng were employed at BGI
Genomics at the time of submission. No other conflicts relevant to this
study should be reported.