Introduction
Commonly occurring taxa within a particular habitat are thought to be critical to that habitat’s and ecosystem’s functions (Hamady & Knight, 2009; Shade & Handelsman, 2012; Turnbaugh & Gordon, 2009; Turnbaugh et al., 2009; Umaña, Zhang, Cao, Lin, & Swenson, 2017). A core set of taxa has been defined as the consistent assemblage of organisms associated with a certain niche space (Hamady & Knight, 2009), and cataloging core taxa has been the focus of many recent microbiome studies, due to the complex nature of high throughput sequence data. Separating taxa into a set of core and non-core or transient members serves the purpose of simplifying multidimensional data and is thought to be advantageous for performing statistical analyses. Support for this simplification stems from empirical consistency between patterns observed with only the core taxa and all taxa present (Delgado-Baquerizo et al., 2018), the idea that commonly occurring core taxa are responsible for community function (Saunders, Albertsen, Vollertsen, & Nielsen, 2016), and from the conservative practice of statistical testing for treatment effects by examining only the most commonly occurring taxa (Wirth et al., 2018). Examining the dynamics and patterns of variation of the core assemblage is often seen as an important step in analyzing and understanding complex community interactions.
The concept of a core community has been operationalized in various sets of criteria that can be applied to identify taxa that could belong to the core assemblage (Delgado-Baquerizo et al., 2018; Gray, Amjad, & Gray, 1983; Lundberg et al., 2012; Shade & Handelsman, 2012; Shade & Stopnisek, 2019; Soliveres et al., 2016; Turnbaugh & Gordon, 2009; Turnbaugh et al., 2009). However, the assumption that a core set of taxa can be accurately identified underlies all core methods, and it is unclear to what extent the concept of a core assemblage is supported by data. Shade and Handelsman (2012) reviewed different criteria for defining the core microbiome including abundance, phylogeny, and function. However, they did not evaluate evidence or support for the concept of a core community. More recently, studies have indicated that some habitats are not occupied by a consistent, core, set of taxa and instead host transients (Hamady & Knight, 2009; Hammer, Janzen, Hallwachs, Jaffe, & Fierer, 2017).
Beyond methodological considerations, focusing on a core subset of taxa might overlook consequential effects of rare taxa. With attention shifting from “who is there?” (i.e. taxonomic composition) to “what are they doing?” (i.e. functionality), the contribution of rare taxa, especially those that serve as hub taxa in complex microbial networks, should not be disregarded simply due to lower abundances (Banerjee, Schlaeppi, & van der Heijden, 2018; Shi et al., 2020). Certain narrowly distributed microbial functions such as nitrification, denitrification, methanogenesis, or sulfate reduction are performed by relatively rare microbes (Jousset et al., 2017; Lynch & Neufeld, 2015). Use of community analyses that only examine abundant or commonly occurring microbes (i.e. core assignments), has the potential to overlook those taxa responsible for important ecosystem functions, like the ones listed above. In focusing solely on core taxa, the contributions of transient or rare taxa are discounted and attributed to commonly occurring ones, potentially overemphasizing the importance of common taxa while simultaneously underestimating the contribution of rare taxa.
Given the considerable and growing interest in using molecular data to characterize diverse communities across many samples and conditions (e.g. Ahrendt et al., 2018; Delgado-Baquerizo et al., 2018; Desnues et al., 2008; Geisen, Laros, Vizcaíno, Bonkowski, & de Groot, 2015; Porazinska et al., 2010; Stat et al., 2017; Tedersoo et al., 2014) and the frequent use of core community analyses, we evaluated the definition a core community and its consequences via multiple methods: First, we compared different methods for defining core membership. Next, we used the core assignments and the full datasets to determine whether the interpretation of differences in community diversity (beta-diversity) would be the same. And finally, we examined to what extent core assignment methods could identify significant hub taxa as determined by cooccurrence network analysis. Our study used microbial datasets from the human microbiome project (Turnbaugh et al., 2007) and soil rhizosphere samples from Arabidopsis thaliana (Lundberg et al., 2012) as well as simulations to examine the validity of splitting taxon count data into two sets (core and non-core), while also assessing the effects of varying criteria on core membership.