INTRODUCTION
Advancements over the last decade in genetic tools and high throughput
detection methods has accelerated the pace of novel genes and variants
associated with monogenic Mendelian diseases. Currently 7000 OMIM
phenotypes with distinct genetic etiologies have been delineated
(Hamosh, Scott, Amberger, Bocchini, &
McKusick, 2005). These global efforts have significantly advanced our
understanding of rare genetic disorders and monogenic diseases. Though
there have been significant contributions of population genomics
research of Indian populations very few studies have provided a
comprehensive genome level understanding of monogenic disorders in
Indian populations. In the present era of precision medicine, there is
an urgent need for nationwide genomics efforts to establish - a
framework for genomic medicine guided healthcare delivery needs; provide
extensive coverage of genomic biomarkers across populations that
facilitate rapid diagnosis and affordable genomic healthcare solutions.
India comprises of 1.3 billion people from diverse ethnic, cultural and
linguistic lineages and shared ancestries with many global populations.
Further, the genetic diversity of the populations has also been shaped
by socio-cultural factors such as endogamy and consanguinity,
geographical clines, its vast history of migration events during
intercontinental exchange of trade and art as also admixtures with local
population (Basu et al., 2003;
Basu, Sarkar-Roy, & Majumder, 2016;
I. G. V. Consortium, 2008;
Reich, Thangaraj, Patterson, Price, &
Singh, 2009). This provides a unique gene-variant-pool and a reservoir
for founder events in recent past, extensive nationwide genomic efforts
have been undertaken to understand its genetic diversity. For instance,
in IGVdb a consortium level efforts have provided a catalogue of single
nucleotide polymorphisms of 900 genes that map to disease associated
regions across 55 diverse Indian populations
(I. G. V. Consortium, 2008;
Narang et al., 2010). Genetic analysis
revealed that ethnicity and language are major determinants than
geography. These studies highlighted that Indian populations can be
divided broadly into four genetic clusters (Austro-Asiatic (AA),
Dravidian (DR), Indo-European (IE) and Tibeto-Burman (TB)) based on
ethno-linguistic classification. DR and IE large are known to exhibit a
large degree of admixture and there are multiple sub-clusters, however,
isolated populations, specifically from DR and AA group are distinct and
unique (I. G. V. Consortium, 2008). In
addition, mitochondrial and Y-chromosome haplogroup based studies have
also helped in characterization of gene pool of diverse Indian
populations (Bamshad et al., 2001;
Borkar, Ahmad, Khan, & Agrawal, 2011;
Kivisild et al., 2003;
Majumder et al., 1999;
Thanseem et al., 2006). The utility of an
India specific baseline variability has been demonstrated during pre-NGS
days - in infectious diseases (For example, Malaria, HIV),
pharmacogenomics studies, disease associations and identification of
at-risk populations for various neurological, cutaneous and high
altitude adaptation related disorders
(Aggarwal et al., 2015;
Aggarwal et al., 2010;
Bhattacharjee et al., 2008;
A Biswas et al., 2007;
Arindam Biswas et al., 2010;
Chaki et al., 2011;
Giri et al., 2014;
Grover et al., 2010;
Gupta et al., 2007;
P. Jha et al., 2012;
Kanchan et al., 2015;
Kumar et al., 2009;
Sinha, Arya, Agarwal, & Habib, 2009;
Sinha et al., 2008;
Talwar et al., 2017).
Due to limited availability of high throughput platforms systematic
efforts to understand the spectrum of Mendelian and monogenic variants
have not carried out across the diverse Indian populations. With the
advent of NGS, Indian other global research groups have put in
additional efforts to provide variant information at the genome wide
scale - SAGE (South Asian Genome and Exome)
(Hariprakash et al., 2018), South Asian
genomes from 1000 Genomes Project (G. P.
Consortium, 2015), south Indians individuals (INDEX-db)
(Ahmed P et al., 2019) and a few others.
The Indian Genetic disease database v1.0 provides information on 1000
genetic disease in over 3500 Indian patients (#IGDD). Other noteworthy
contributions have been made in the genetics of hemoglobinopathies
(thalassemia and sickle cell anemia), Duchenne Muscular Dystrophy (DMD),
cystic fibrosis (CF), spinocerebellar ataxias, Mitochondrial disorders,
cardiomyopathies (Pradhan et al., 2010).
There is now also representative knowledgebase of Indian genetic
disorders that aggregate information from NGS and single sequencing
based multiple case reports studies in Lysosomal storage disorders,
skeletal dysplasias and disorders of primary immunodeficiencies,
genodermatosis and other neurogenetic ailments
(http://guardian.meragenome.com/). A recently published GenomeAsia
100k Project (GAsP) data provided a comprehensively covered genome level
data of over 1700 individuals from different Asian countries, thus
highlighting the need for adequate representation of Asian genome level
information in public databases (GenomeAsia100K Consortium, 2019).
Multiple country wide efforts are ongoing from government funded basic
and translational genomic research laboratories, genetics unit of
tertiary hospitals and commercial enterprise to meet the needs of
clinical genetics segment of healthcare system in India. Despite these
there are a few unmet challenges for implementation of genomics medicine
in Indian populations. Primarily, either due to lack of
representation of different ethnic populations of India or low sample
size in earlier studies conducted in Indian populations. Therefore, we
have 1.) paucity of knowledge for mutations spectrum and their
frequencies, 2.) lack of systematic characterization of known pathogenic
mutations linked to various monogenic disorders, 3.) scarcity of
knowledge of genetic spectrum of 7000 OMIM phenotypes and other
prevalent genetic disorders, 4.) characterization of novel mutations.
To address these issues primarily, our study provides a comprehensive
catalogue of monogenic disease linked variants in diverse Indian
populations (n=2795). Our study utilized a high throughput and
affordable genomics tool that provides information of over 19,538 global
clinical annotated variants using Global Screening Array (GSA) from
Illumina. In brief, the content of our study is novel and unique as : i)
it covers diverse multiethnic Indian cohorts with large sample size of
2795 healthy subjects, ii) provides frequency distribution of known
pathogenic variants for Inborn errors of Metabolism, hematological
disorders and other Mendelian disorders in Indian populations, (iii)
representation of SAS pathogenic variants is higher in our study i when
compared with other global repositories like 1000 Genome populations
(G. P. Consortium, 2015), The Genome
Aggregation Database (gnomAD) (K.
Karczewski & Francioli, 2017) and The Exome Aggregation Consortium
(ExAC) (K. J. Karczewski et al., 2016)
and GenomeAsia100K (GenomeAsia100K Consortium (2019). We have created a
unique database to catalogue and register the information of clinically
relevant variants for Indian population. Further, we were able to
demonstrate that our cohort is genetically much more diverse than
representative South Asian populations in 1000 genome dataset to provide
opportunities and gaps for future research.