2.3. Statistical analyses
Raw sequence data were analyzed using the R 4.0.1 platform using “dada2” packages for the analysis of the 16S rRNA gene sequence (https://benjjneb.github.io/dada2/tutorial.html) (Callahan et al. 2016). Briefly, the adapters and primer sequences were first removed from raw sequence data using “cutadapat.” Moreover, clean sequences underwent trimming and merging. Amplicon sequence variants (ASVs) were derived following the removal of chimeric sequences, and their categorization was achieved using the Silva database release 138 to attain taxonomic insights (Quast et al. 2013; Yilmaz et al. 2014). The ASV table was subsampled to the minimum requisite sequence count for subsequent statistical assessments. Calculation of α-diversity (Shannon and Chao1 indices) was executed with the “microeco” and “vegan” packages (Liu et al. 2021, Oksanen J et al. 2022). The α-diversity and community composition visualizations were produced using Origin 2020 and the “ggplot2” packages in R (Pingram et al. 2019). Non-metric multidimensional scaling (NMDS) based on Bray-Curtis distances was performed using “micreco” packages to visualize the similarity between samples.
To analyze the community composition of Bathyarchaeia , a phylogenetic tree was constructed employing reference sequences from a prior study to classify the Bathyarchaeia subgroup (Zhou et al. 2018). The outgroup sequences belonged to Crenarchaeum (Cenarchaeum symbiosum ) and Nitrosoarchaeum (Nitrosoarchaeum koreensis ). These reference sequences encompassed 15 Bathyarchaeial subgroups (Zhou et al. 2018). ASVs affiliated with Bathyarchaeia , as per the Silva 138 database, were also selected. The construction of the phylogenetic tree was executed within the MEGA11 platform (Tamura et al. 2021). The alignment of all sequences was performed using ClustalW, and the Maximum Likelihood tree was employed for the construction, with a Bootstrap analysis (1000) being carried out to evaluate tree topology (Zhou et al. 2018). Based on the tree, the subgroup information of Bathyarchaeial ASVs was obtained and used for downstream statistical analyses. ArcMap software was used to predict and visualize the large-scale distribution pattern of Bathy-6 across eastern China paddy soils for the analysis of predictive atlas maps. The Kriging interpolation method was used to estimate the relative abundance of Bathy-6 across the whole map after the input of site information, including geographical coordinates and the relative abundance of Bathy-6 . Further, the predictive maps were obtained using a province mask. For the heatmap of Bathyarchaeial ASVs, the figure was constructed using Evolview (Subramanian et al. 2019).
To investigate the determinism and stochasticity in influencing archaeal and Bathyarchaeial community structure, the Sloan neutral community model (NCM) was used to determine the effect of stochasticity on the archaeal and Bathyarchaeial community assembly using the “Hmisc” package (Sloan et al. 2006, Harrell & Dupont 2019). The “spaa” package was used to evaluate the width and overlap of the niche (Zhang 2016). A cognitive assessment was employed to ascertain the connection between environmental factors and microbial communities with the utilization of the “linkET” package. Structural equation modeling (SEM) was employed to quantify the direct and indirect influences of environmental factors on the shaping of both the archaeal and Bathyarchaeial communities, utilizing SPSS and AMOS software. To elucidate the correlational association between environmental factors and the relative abundance of Bathyarchaeial subgroups, Pearson’s correlation analysis was conducted through the “microeco” packages. The graphical representations were generated using Origin 2020.
For the co-occurrence network analysis, Spearman’s correlation coefficients between ASVs were initially calculated through the “microeco” packages on the R platform. The Spearman’s correlation threshold was set at a coefficient > 0.7 or < -0.7 with a significance level of p < 0.01. Subsequently, the networks were visualized using Gephi.