4.1 Species delimitation in the “messy middle”
Differences in species delimitation studies, ranging from the number and type of loci selected, how candidate species are initially identified, e.g., morphology-based vs. sequenced-based, to empirical analytical approaches, can result in discrepancies in the inferred species boundaries (Carstens et al., 2013; Luo et al., 2018). Furthermore, differences in diversification histories likely require different types of data and different analytical approaches in order to accurately delimit species boundaries (Camargo and Sites, 2013; Leavitt et al., 2015). With an increasing number of species inferred from phylogenomic datasets, the problem of inferring population structure versus actual species boundaries becomes acute (Leaché et al., 2018). Therefore, how researchers select the appropriate number of loci and analytical approaches becomes increasingly important as analyses continue to increase in scale. Recognizing that any species delimitation approach infers hypotheses of species boundaries, rather than providing the ultimate answer/solution, is imperative (Matute and Sepúlveda, 2019). Hence, integrating independent lines of evidence with phylogenomic datasets can provide increasingly robust hypotheses of species boundaries (Fujita et al., 2012; Yang et al., 2019).
The results of our study highlight the problem of delimiting species with complex, recent phylogeographic histories, particularly in groups such as Niebla. Despite the informed perspective from genome-scale data and long-term investigations of phenotype-based circumscriptions, species boundaries in Niebla lichens are far from being resolved. While differences in the number of delimitated candidate species in Niebla , depending on the analyzed loci and species delimitation analysis, resulted in no clear favored species delimitation model, our analyses supported high species-level diversity in Niebla . Both BPP species delimitation analyses and BFD* analyses implemented in SNAPP provided strong evidence for multiple species-level lineages in ASAP species partitions inferred from the standard fungal barcoding marker, the ITS region (Figs. 2 & 3; Dryad file S54). The ‘tip-down’ approach using BPP+gdi scores (Jackson et al., 2017; Leaché et al., 2018), supported the highest number of candidate species from our dataset, a 29-species model (Fig. 3). The BFD* analyses also generally favored the most divisive species delimitation models (Dryad file S54), although independent runs did not always converge with the more divisive species models. While the BFD* provides a direct, quantitative method for comparing species models (Leaché et al., 2014), with increasing numbers of sampled loci and candidate species, computational limitations may constrain the utility of this methodological approach (Leaché et al., 2014). In this study, we attempted to circumvent these limitations by subdividing the topology in order to analyze computationally feasible data subsets. Despite those efforts, convergence among independent runs remained an issue in models with even a modest number of species (Dryad file S54).
Although the heuristic gdi criterion for species delimitation provides a metric to guide the distinction of population structure from species boundaries (Jackson et al., 2017; Leaché et al., 2018), we show here that locus selection can have a substantial impact on gdi scores. In the empirical scenario of Niebla, selecting limited, but highly variable, RADseq loci resulted in the lowest gdi scores, specifically in the 10- and 163-marker datasets (Fig. 5). In contrast, randomly selected RADseq loci, resulted in largely consistent, higher gdi scores; and these gdi scores were more consistent with the 500-most variable RADseq loci dataset. These results suggest that when using short RADseq loci, it is important to maximize the number of analyzed loci if possible, but that lower number of randomly sampled loci can result in consistent inferences. The heuristic gdi criterion may also be biased in cases where a population was established by a few founder individuals and NAand θ may be very small. Because the criterion depends on the population divergence time relative to population size (2τ /θA ), the use of gdi may lead to claims of species status even if the populations are recently diverged (Leaché et al., 2018). We speculate that this may be the case for Niebla populations in Baja California. Here we inferred recent divergence among Niebla lineages, and it would not be unreasonable to suspect that independent populations were founded by few individuals. If this is the case, it may be necessary to consider both the absolute population divergence, as well as the population divergence relative to population size (Leaché et al., 2018; Yang and Rannala, 2010). Finally, given the large range of indecision using the gdi, it may not be possible to infer with confidence species boundaries given the nature of the speciation process in these lichen-forming fungi.