Spectrum of known pathogenic mutations in Clindb and other
genomic databases
We have created a resource “Clindb” that houses frequency spectrum of
known 9853 pathogenic variants (out of 19,538 mapped variants in
ClinVar) in diverse Indian populations. Frequency distribution of 9853
pathogenic variants in Clindb was compared with SAS groups in 1000
genomes, gnomAD, ExAc and GAsP. Figure 4a – shows that Clindb
has maximum unique variants (1128) with frequency of pathogenic variants
in comparison to other databases. This number remains higher even if
include variants with number of carriers >1. This
necessitates the use of large and diverse cohorts from Indian
populations in further genomic studies.
To evaluate the reliability of frequency estimates of Clindb, we
compared the average frequency difference between Clindb and other
databases under study and, also, compared the average frequency
difference of of GAsP with other databases. Our analysis revealed that
overall frequency difference between Clindb is lower than GAsP
(Figure 4b ).
We found 12 genes with carrier frequency ≥ 1% (Table S8 ).
MBL2, CBS and ZGRF1 are top genes with highest frequencies. Apart from
cystathionine beta-synthase (CBS gene, category: Inborn errors of amino
acid metabolism), there are few other genes with high carrier frequency
that are related to different IEM classes. The distribution of variants
in different IEM categories is discussed below.