Rule selection, optimization and implementation
Twenty-four ACMG/AMP rules for genetic hearing loss were grouped into
four sections: population (PM2, BA1, BS1, BS2), computational (PVS1,
PS1, PM1, PM4, PM5, PP3, BP3, BP4, and BP7), case/segregation (PM3,
PS2/PM6, PS4, PP1, PP4, BS4, BP2, and BP5) and functional (BS3 and PS3)
data. VIP-HL automates all 13 population and computational rules, while
case/segregation and functional criteria (n=11) still require manual
curation.
For population rules (PM2, BA1, and BS1), our implementation of
population frequency cutoffs relied on gnomAD v2.1 exomes and genomes
combined dataset (Karczewski et al., 2020). The cutoffs followed the
guideline for genetic hearing loss (Oza et al., 2018). The highest
filtering allele frequency across all gnomAD populations (“popmax”)
was applied as the allele frequency for each variant (Whiffin et al.,
2017). To determine the BS2 criterion, we retrieved the homozygote
number from the gnomAD control dataset. BS2 was applied when the
homozygote number was greater than three for autosomal recessive
disease, and one for autosomal dominant disease. It should be noted that
ClinGen HL-EP did not determine the cutoffs of the homozygous number for
BS2. If users disagree the default setting, it is adjustable in VIP-HL.
The parameterization of nine computational criteria (PVS1, PS1, PM1,
PM4, PM5, PP3, BP3, BP4, and BP7) was described as follows. The
specifications of PVS1 were clarified before (Abou Tayoun et al., 2018;
Xiang, Peng, Baxter, & Peng, 2020). PM1 can be applied when a variant
is in a mutational hotspot region or well-studied functional domain
without benign variation (Oza et al., 2018; Richards et al., 2015). The
hotspot region was determined based on the enrichment of pathogenic
variants, as clarified in our recent work (Xiang, Peng, et al., 2020).
Pathogenic/likely pathogenic variants from ClinVar 20200629 release were
retrieved to determine the existence of the same amino acid changes
(PS1) and the different amino acid changes (PM5). For example, knowing
that NM_004004.6(GJB2 ):c.109G>A (p.Val37Ile) is a
well-established pathogenic variant in ClinVar, we can now apply PM5 for
NM_004004.6(GJB2 ):c.109G>T (p.Val37Phe).
PM4 was applied when the in-frame deletion or insertion was not in the
repetitive region which is evolutionary well conserved (Oza et al.
2018). The repetitive region was determined based on RepeatMasker (Chen,
2004) and Tandem repeats finder (Benson, 1999). The region with GERP
scores higher than two was considered as evolutionarily conserved
(Cooper et al., 2005). BP3 was applied while the in-frame indels were in
repeat region without known function.
In silico tools selected for implementing PP3, BP4 and BP7 were REVEL
(Ioannidis et al., 2016) and MaxEntScan (Yeo & Burge, 2004). PP3 was
applied with a REVEL score of ≥0.7 and BP4 was applied with a REVEL
score≤0.15 (Oza et al., 2018). PP3 can also be applied when
non-canonical splice variants were predicted to have an impact on
splicing via MaxEntScan (Yeo & Burge, 2004). BP7 was employed when a
synonymous variant was predicted with no impact on splicing via
MaxEntScan and the nucleotide is not highly conserved (GERP <
2) (Davydov et al., 2010).
The final pathogenicity is reported in five tiers system proposed by the
ACMG/AMP guideline, namely ”Pathogenic (P)”, ”Likely Pathogenic (LP)”,
”Variant of Uncertain Significance (VUS)”, ”Likely Benign (LB)”, and
”Benign (B)” (Richards et al., 2015).