Co-abundance of Gene Groups and Metagenome Species/Strain with a
consistent definition and efficient algorithm
De Filippis et al48 identified that within
the genera Prevotella, the presence of distinct oligotypes had
differential associations with non-vegetarian and vegetarian
diets48 . In the present study, the
representative sequence of OTUs assigned to Prevotella were mapped on to
the known oligotypes to check the species association. About 90% of the
Prevotella OTU sequences identified were P11 and P12 oligotype
representative of non-vegetarian type, indicating their prevalence in
North Indian population (Supplementary Table ST1)
To quantify species and strain level genomic variation accurately and
broadly at four different heights, co-abundance of genes (CAG) were
identified by binning of SOAP- a denovo assembly of all Shotgun
Metagenome samples. A total 57 bins were obtained from Metabat2
software. 12 bins which had greater than 150 genes, were accurately
selected for further processing to avoid erroneous, inconsistent and
incomplete annotations that would affect some taxonomies. Genes which
were differentially abundant and had p <0.05 were considered
for visualization (Supplementary Figure S5).
We used an integrated pipeline for profiling both species and stain
level abundance and genomic variations, from metagenomes. MIDAS analysis
pipeline generated few more bacteria in addition to results obtained
from Metabat2. MIDAS was able to capture the majority of microbial
species abundance across the subjects, making it well suited for
uncovering strain-level variation associated with various heights. For
species with maximum coverage, reads were aligned to pan-genome database
of genes to estimate gene coverage, copy number and presence or absence
and finally detected SNPs. The pangenome reconstructed bacterial profile
revealed at all time points, was filtered for minimum pangenome coverage
of 70 % demonstrating 37 species and strains (Supplementary Figure S6).
The relative abundance profile generated was analysed through T-test to
identify highly significant taxa with an FDR cut off of 0.05. There was
a significant correlation between Metabat2 and MIDAS results which
justifies the presence of Roseburia, Prevotella, Faecalibacterium,
Eubacterium & Bacteroides, significantly enriched out of 37 species by
both the methods of analysis.