2.5. Microbial Diversity Analysis
Alpha and beta diversities reflect the richness within a sample and difference in bacterial composition among different sites, respectively (Morris et al. , 2014) . For alpha and beta diversity metrics, a rooted phylogenetic tree was generated and alpha rarefaction, as well as taxonomic classification of full-length sequences, was then performed using SILVA (Quast et al. , 2013). To standardize the data, only operational taxonomic units (OTUs) containing seven or more counts in at least one sample were retained prior to ordination in R using the Phyloseq package (McMurdie and Holmes, 2013). Approximately 75% of the original number of taxa across all samples was retained following the filtering criterion of including OTUs with 7 or more counts. The total number of OTUs following filtering and standardization was 5,776.
Principal Coordinates Analysis (PCoA) was used as the method of ordination that captured the most of total variance in the top 2 principal coordinates (mainly weighted UniFrac, a distance metric used to compare microbial communities), along with unweighted UniFrac for comparison (Lozupone and Knight, 2005). The UniFrac measure takes the phylogenetic relationship of species into account and is widely used in microbial ecology (Lozupone et al. , 2011). It should be noted that unweighted UniFrac is a qualitative measure of diversity mostly showing rare taxa, whereas the abundance of taxa is considered in weighted UniFrac, making it a quantitative measure (Lozupone et al. , 2007). While all OTUs were used to calculate diversity metrics and PCoA, microbial communities were investigated at the phylum level to find potential linkages between anthropogenic activities and the abundance of specific bacteria within each community.
Classic clustering using the unweighted pair group method with arithmetic mean (UPGMA), single linkage and Ward’s methods, and K-means clustering at phylum level was conducted in PAST Package (Hammeret al. , 2001). For the K-means methods, three clusters were chosen based on some preliminary runs and considering channel samples, bay samples, and samples affected by the anthropogenic activities. The significance of the clustering analysis was tested using one-way multivariate analysis of variance (PERMANOVA). Stations were grouped based on the output of the clustering analysis and permutations (N=100,000 due to the high number of variables) were used to assess significance.