Assessment of genetic structure/architecture of subjects
Principal component analysis (PCA) was performed using smartpca module in EIGENSOFT package to elucidate as well as to compare the genetic structure of our study cohorts with populations in the1000 genomes project (Price et al., 2006).. For genetic mapping of our study subjects w.r.t diverse Indian ethnic and linguistic groups, we used genome-wide reference dataset of 471 healthy individuals genotyped on OMINI array, Illumina Inc. (Data unpublished) as a representation of Indian genomic diversity. These samples were collected as part of Indian Genome Variation (IGV) Consortium study. We compared the representation of Indian genomic diversity with our study samples as well as with 1000 genomes data (n=2,504).
To perform PCA analysis, we used 161,484 markers common across three datasets i.e. samples in this study, 1000 genomes as well as reference IGV data. PLINK was used to merge the data for common markers. ggplot package in R was used to create customized PCA plots. FST values from smartpca results were also used to compare genetic differentiation among populations.