Genetic diversity analysis of the study cohorts
In this analysis, PCA used to compare and assess the genetic diversity
of our study cohorts with respect to the 1000 genomes and IGV
populations (Figure-S2 ). As expected, 1000 genomes European
(EUR,) American(AMR) and African (AFR) super populations are distant
while SAS is proximal to majority of the IGV large populations. Though
TB group is closer to EAS super population than any other super
population of 1000 genomes as well as IGV populations(Figure-2). OG-W-IP (an outgroup population of African
descent), which was earlier demonstrated to be an admixed Indo-African
population from western part of Indian is present in a cline between
Indian and African populations (Narang et
al., 2011; Shah et al., 2011). Further,
we excluded the 1000 genomes AFR, AMR and EUR super populations as well
as the Indian outgroup population (OG-W-IP) to fine map genetic
structure. We clearly observed that majority of the IE and DR large
populations are proximal to the 1000 genomes SAS group (1kg_SAS).
However, AA and DR isolated populations as well as TB genetic cluster
are under-represented in the 1000 genomes. EAS group (1kg_EAS) in 1000
genomes is genetically distinct from populations in TB cluster
(FST=0.01-0.02) (Figure S3) .
Underrepresentation of Indian genomic diversity in 1000 genomes was
earlier reported and also substantiated our findings
(Sengupta, Choudhury, Basu, & Ramsay,
2016). Also, recently published GAsP project lacks representation from
TB group and moreover, has comparatively less number of samples in SAS
group (n=724) which might bias frequency estimations in SAS group.
Lastly, we compared the genetic diversity of our study cohorts with IGV
populations as well as 1000 genomes SAS and EAS group. Figure 3shows we have representation of IE and DR large populations as well as
from TB group (high altitude populations) in our cohorts. Representation
of AA and DR isolated groups in our study samples is also lacking.
However, FST analysis suggests that our study cohorts
are more proximal to IGV populations than 1Kg_SAS. More specifically,
AA and DR isolated groups as well as TB low altitude populations are
genetically more closer to our study cohorts than 1kg_SAS