Introduction
Species delimitation has never been a simple or straightforward task, due, in part, to how species are conceptualized (Mayden, 1997; Wilkins, 2011). However, conceptualizing species as “separately evolving metapopulation lineages” – the unified species concept (USC) – shifts the debate away from species concepts to how species boundaries can be empirically delimited (de Queiroz, 2007). Under the USC, rather than weighing which property, such as morphology, monophyly, or reproductive isolation is indicative of a population representing an independent lineage, these properties simply become lines of evidence that can be used within an empirical framework for delimiting species (Sites and Marshall, 2004). Different properties may arise at different times or in different orders during the process of speciation (de Queiroz, 1998; de Queiroz, 2005; de Queiroz, 2007). Thus, early in the process of speciation, there is often a grey area where species boundaries are difficult to ascertain and potential for conflicting signals among independent lines of evidence (Carstens et al., 2013; Huang and Knowles, 2015; Mallet, 2008). Hence, one of the biggest challenges of species delimitation is in determining when differences in intra-specific population structure becomes species-level divergence.
With closely related or recently diverged species, delimitation of species boundaries using traditional phenotype-based approaches, e.g., morphology, ecological, or chemical characteristics, becomes challenging (Poelstra et al., 2020; Printzen, 2010; Wagner et al., 2013). The inclusion of molecular sequence data can be a critical tool for inferring robust species boundaries, especially within an integrative framework (Dayrat, 2005; Fujita et al., 2012; Will et al., 2005). Particularly, high-throughput sequencing technologies allow for genome-scale data to be generated using a wide range of time- and cost-effective strategies, generating hundreds to thousands of loci (Hale et al., 2020; McKain et al., 2018). However, even when using genome-scale data and empirical species delimitation analyses, the species boundary can remain blurred, especially in recently diverged species, hybrid zones, cases of introgression, and when species boundaries remain semipermeable (Harrison and Larson, 2014; Hey et al., 2003; Leaché et al., 2018). Furthermore, harnessing genome-scale data creates computational challenges and divergent approaches for selecting appropriate genomic loci (Faircloth et al., 2012; Kapli et al., 2020). While additional loci allows for the detection of subtle differences in population structure, potentially resolving recent species-level divergence (Huang, 2020), these data may also lead to over-splitting populations and inappropriately recognizing populations as species (Rannala, 2015; Sukumaran and Knowles, 2017).
Similar to other non-model organismal groups, interpreting species-level diversity in lichen-forming fungi has been revolutionized by incorporating DNA sequence data (Lumbsch and Leavitt, 2011; Printzen, 2010; Schneider et al., 2016). Interpreting anatomical, chemical, and morphological characters of lichens within a phylogenetic framework has transformed our understanding of the role of phenotypic characters used for lichen-forming fungal taxonomy (Printzen, 2010). While portions of the ribosomal cistron, including the internal transcribed spacer (Schoch et al., 2012) – the standard barcoding marker for fungi, and regions of the mitochondrial genome, have been the standard genetic loci for lichen fungi systematics, genomic data have become increasingly useful when studying closely related species or recently diverged populations (Allen et al., 2018; Alonso-García et al., 2021; Grewe et al., 2017; Leavitt et al., 2016; Widhelm et al., 2021).
Here we investigate species boundaries in a lineage of fruticose lichen-forming fungi endemic to coastal fog deserts along the Pacific coasts of the New World – Niebla Rundel & Bowler in the family Ramalinaceae. Members of this genus form conspicuous, fruticose lichens, expressing an impressive range of morphological and chemical diversity (Fig. 1). This phenotypic variation has resulted in two widely different interpretations of species-level diversity, either a few, morphologically polymorphic species (Bowler and Marsh, 2004) or a much finer subdivision, two genera, Niebla with 42 species and one variety, and Vermilacinia Spjut & Hale with 28 species (Spjut, 1996). These genera were recently investigated within a multi-locus, phylogenetic framework (Spjut et al., 2020). Niebla diversified more recently, estimated at ca. 13 Mya, with speciation largely impacted by the climatic oscillations since the Miocene (Spjut et al., 2020). While evidence from Spjut et al. (2020) support high levels of diversity in Niebla , most phenotypically circumscribed Nieblaspecies were not recovered as monophyletic using a multi-locus dataset. These results have been explained by model of micro-endemism of allopatric species in Niebla , with convergent forms occurring in separate populations, especially different geographic regions within the coastal fog deserts of California, Baja California, and Baja California Sur (Spjut et al., 2020 – e.g., N. homalea , N. undulata , Fig. 7). A – a Furthermore, despite generating a robust, multi-locus dataset and implementing multiple DNA-based empirical species delimitation analyses, species boundaries remain uncertain due to discrepancies among different analytical approaches (Carstens et al., 2013; Spjut et al., 2020).
Given the incongruence between what has been considered taxonomically diagnostic phenotypic characters and results from multi-locus phylogenetic inference and species delimitation analyses, here we aim to explore the utility of genome-scale data to resolve species boundaries in this fascinating lineage of lichen-forming fungi. In addition, we test the hypothesis of micro-endemism in this clade of symbiotic fungi. Specifically, we used restriction site-associated DNA sequencing (RADseq) to generate tens of thousands of short loci distributed across the genome to infer evolutionary relationships. Several species delimitation analyses, including BPP and SNAPP+Bayes factor delimitation (BFD*), were used to infer species boundaries. BPP is a Bayesian MCMC program for analyzing genomic sequence data under the multispecies coalescent model and has been shown to be able to delimit species boundaries even among closely related lineages (Flouri et al., 2018; Yang and Rannala, 2010, 2014; Zhang et al., 2011). Given that genomic data be can effective in delimiting fine-scale population structure, rather than species, we use the recently heuristic genealogical divergence index (gdi ) to distinguish between population structure and species boundaries (Jackson et al., 2017; Leaché et al., 2018). With BPP’s estimates of coalescent node heights (τ ) and ancestral effective population size (θ ),gdi scores can be calculated to estimate the divergence of two taxa (Leaché et al., 2018; Poelstra et al., 2020). Complementarily, we used SNAPP+BFD* to rank different species delimitation models. SNAPP attempts to decrease computation time for large amounts of phylogenetic datasets by bypassing the need for gene trees, and species tree estimates are directly inferred from biallelic markers, such as SNP data (Bryant et al., 2012). For each model, Bayes factor scores are calculated based on their marginal likelihood values using a path-sampling analysis and subsequently compared (Leaché et al., 2014). While this comparison is useful in identifying the best-fitting species delimitation models, it computationally limited to only smaller datasets with fewer than 10 species and under 100 individual samples (Leaché et al., 2014).
Our results provide crucial insights into the diversification history ofNiebla and highlight the power and limitations of using genome-scale data to infer species boundaries in recently diverged lineages.