Spectrum of known pathogenic mutations in Clindb and other genomic databases
We have created a resource “Clindb” that houses frequency spectrum of known 9853 pathogenic variants (out of 19,538 mapped variants in ClinVar) in diverse Indian populations. Frequency distribution of 9853 pathogenic variants in Clindb was compared with SAS groups in 1000 genomes, gnomAD, ExAc and GAsP. Figure 4a – shows that Clindb has maximum unique variants (1128) with frequency of pathogenic variants in comparison to other databases. This number remains higher even if include variants with number of carriers >1. This necessitates the use of large and diverse cohorts from Indian populations in further genomic studies.
To evaluate the reliability of frequency estimates of Clindb, we compared the average frequency difference between Clindb and other databases under study and, also, compared the average frequency difference of of GAsP with other databases. Our analysis revealed that overall frequency difference between Clindb is lower than GAsP (Figure 4b ).
We found 12 genes with carrier frequency ≥ 1% (Table S8 ). MBL2, CBS and ZGRF1 are top genes with highest frequencies. Apart from cystathionine beta-synthase (CBS gene, category: Inborn errors of amino acid metabolism), there are few other genes with high carrier frequency that are related to different IEM classes. The distribution of variants in different IEM categories is discussed below.