Metagenome species and the strain level analysis
As it is known that the human health conditions are linked to microbial communities, phenotypes are often associated with only a subset of strains within causal microbial groups. Therefore, for WGS metagenome data we used Metagenome binning with abundance and tetranucleotide frequencies V.2 (Metabat2)42 , and Metagenomic Intra-Species Diversity Analysis System (MIDAS)43 , tools for identifying metagenome species and strain-level metagenomic classification at default parameters. De novo assembly for all 12 samples was performed by using short oligonucleotide analysis package (SOAP)44 , at K-mer65, followed by binning using Metabat2 software. Bins greater than 150 genes were selected for further analysis. Genes which were differentially abundant and had p<0.05, were considered for visualization.
Species-level coverage was obtained by using MIDAS database across samples from sojourners visiting different heights. For species with sufficient coverage, reads were aligned to a pan-genome database of genes to estimate gene coverage45 , copy number and presence or absence. The core genome was defined directly from the data by identifying high-coverage regions (>70% coverage of the pangenome genes), across multiple metagenomic samples, providing a comprehensive strain-level genetic overview of the gut microbial diversity.