Sequencing and Analysis
16S rRNA gene sequencing methods were adapted from the methods developed for the National Institutes of Health-Human Microbiome Project [31]. Briefly, the 16S rRNA V4 region was amplified and sequenced on Illumina Iseq 100 platform using manufacturer’s instructions. Raw 16S amplicon sequence (forward reads only) and metadata were demultiplexed using split_libraries_fastq.py script implemented in QIIME1.9.1 [32]. The demultiplexed fastq file was split into sample-specific fastq files using split_sequence_file_on_sample_ids.py script from Qiime1.9.1 [32]. Individual fastq files without nonbiological nucleotides were processed using Divisive Amplicon Denoising Algorithm pipeline [33]. The output of the dada2 pipeline (feature table of amplicon sequence variants) was processed for alpha and beta diversity analysis using phyloseq and microbiomeSeq (http://www.github.com/umerijaz/microbiomeSeq) packages in R [34]. Alpha diversity estimates were measured within group categories using estimate_richness function of the phyloseq package [34]. Nonmultidimensional scaling (NMDS) was performed using Bray-Curtis dissimilarity matrix [35] between groups and visualized by using ggplot2 package [36]. We performed an ANOVA among sample categories while measuring α-diversity using the plot_anova_diversity function in microbiomeSeq package (http://www.github.com/umerijaz/microbiomeSeq). We then performed permutational multivariate ANOVA with 999 permutations to test the statistical significance of the non-multidimensional scaling patterns with the ordination function of the microbiomeSeq package. Pair wise two group analysis was performed using White’s non-parametric t-test [37]. We assessed the statistical significance (P<0.05) throughout and whenever necessary adjusted P values for multiple comparisons according to the Benjamini and Hochberg method to control false discovery rate [38], while performing multiple testing on taxa abundance according to sample categories.