Rule selection, optimization and implementation
Twenty-four ACMG/AMP rules for genetic hearing loss were grouped into four sections: population (PM2, BA1, BS1, BS2), computational (PVS1, PS1, PM1, PM4, PM5, PP3, BP3, BP4, and BP7), case/segregation (PM3, PS2/PM6, PS4, PP1, PP4, BS4, BP2, and BP5) and functional (BS3 and PS3) data. VIP-HL automates all 13 population and computational rules, while case/segregation and functional criteria (n=11) still require manual curation.
For population rules (PM2, BA1, and BS1), our implementation of population frequency cutoffs relied on gnomAD v2.1 exomes and genomes combined dataset (Karczewski et al., 2020). The cutoffs followed the guideline for genetic hearing loss (Oza et al., 2018). The highest filtering allele frequency across all gnomAD populations (“popmax”) was applied as the allele frequency for each variant (Whiffin et al., 2017). To determine the BS2 criterion, we retrieved the homozygote number from the gnomAD control dataset. BS2 was applied when the homozygote number was greater than three for autosomal recessive disease, and one for autosomal dominant disease. It should be noted that ClinGen HL-EP did not determine the cutoffs of the homozygous number for BS2. If users disagree the default setting, it is adjustable in VIP-HL.
The parameterization of nine computational criteria (PVS1, PS1, PM1, PM4, PM5, PP3, BP3, BP4, and BP7) was described as follows. The specifications of PVS1 were clarified before (Abou Tayoun et al., 2018; Xiang, Peng, Baxter, & Peng, 2020). PM1 can be applied when a variant is in a mutational hotspot region or well-studied functional domain without benign variation (Oza et al., 2018; Richards et al., 2015). The hotspot region was determined based on the enrichment of pathogenic variants, as clarified in our recent work (Xiang, Peng, et al., 2020).
Pathogenic/likely pathogenic variants from ClinVar 20200629 release were retrieved to determine the existence of the same amino acid changes (PS1) and the different amino acid changes (PM5). For example, knowing that NM_004004.6(GJB2 ):c.109G>A (p.Val37Ile) is a well-established pathogenic variant in ClinVar, we can now apply PM5 for NM_004004.6(GJB2 ):c.109G>T (p.Val37Phe).
PM4 was applied when the in-frame deletion or insertion was not in the repetitive region which is evolutionary well conserved (Oza et al. 2018). The repetitive region was determined based on RepeatMasker (Chen, 2004) and Tandem repeats finder (Benson, 1999). The region with GERP scores higher than two was considered as evolutionarily conserved (Cooper et al., 2005). BP3 was applied while the in-frame indels were in repeat region without known function.
In silico tools selected for implementing PP3, BP4 and BP7 were REVEL (Ioannidis et al., 2016) and MaxEntScan (Yeo & Burge, 2004). PP3 was applied with a REVEL score of ≥0.7 and BP4 was applied with a REVEL score≤0.15 (Oza et al., 2018). PP3 can also be applied when non-canonical splice variants were predicted to have an impact on splicing via MaxEntScan (Yeo & Burge, 2004). BP7 was employed when a synonymous variant was predicted with no impact on splicing via MaxEntScan and the nucleotide is not highly conserved (GERP < 2) (Davydov et al., 2010).
The final pathogenicity is reported in five tiers system proposed by the ACMG/AMP guideline, namely ”Pathogenic (P)”, ”Likely Pathogenic (LP)”, ”Variant of Uncertain Significance (VUS)”, ”Likely Benign (LB)”, and ”Benign (B)” (Richards et al., 2015).