3.2.2 Species delimitation using RADseq loci under the
multi-species coalescent using BPP
Estimates of θ values under the multispecies coalescent model –
analysis ‘A00’ in BPP – were on the same order of magnitude regardless
of different RADseq loci subsamples, although standard deviation was
much greater in the 163- and 500-most variable loci data subsets.
Greater variation in τ values was observed depending on the subsampled
RADseq loci (Table 1). Results using the same subset of RADseq loci
generally converged on similar gdi estimations. Species
delimitation using a fixed guide tree – analysis ‘A10’ in BPP – and
the candidate species shown in Fig. 3 inferred up to 17 species, the
maximum subdivision specified in the guide tree. Aggregate probabilities
across all runs resulted in the highest probabilities across the 13–17
species range, with a 15 species model being most likely across all
analyses (Fig. 4). However, the results of the ‘A10’species delimitation
analyses in BPP varied across both subsets of data and independent runs,
with the randomly sampled subsets showing less proclivity of an
unrealistic one-species model. (Table 1; Dryad file S56).
While the candidate species collapsed under each analysis varied widely,
the most frequent candidate species groups that were collapsed included:
‘O’, ‘P’ and ‘Q’ collapsed into 1 or 2 species; ‘J’, ‘K’, ‘L’, ‘M’ and
‘N’ were frequently collapsed into 1, 2 or 3 different species groups;
and ‘A’, ‘B’, ‘C’ and ‘D’ which were frequently collapsed into 1 or 2
species groups, sometimes including ‘E’ (Fig. 3).
3.2.3 Species delimitation validation using gdi scores
The gdi scores inferred from parameters under a multispecies
coalescent model – analysis ‘A00’ in BPP – generally did not provide
justification for collapsing any nodes in the 17-species model – based
on 17 well-supported clades identified in the ML topology, the maximum
subdivision specified in the guide tree (Fig. 3; Dryad file S56),
suggesting that our sampling includes at least 17 species-level
lineages. However, in the analyses of the 10 and 163 most informative
RADseq loci, a number of candidate species were collapsed based ongdi . In the 10-loci subset, candidate species ‘A’, ‘B’, ‘C’, and
‘D’ were combined (gdi scores <0.2), as were candidate
species ‘F’ and ‘G’, candidate species ‘J’ and ‘K’, candidate species
‘M’ and ‘N’, and finally candidate species ‘P’ and ‘Q’ (Fig. 3). In the
163-loci subset analysis, candidate species ‘A’, ‘B’, and ‘C’ were
collapsed together, as well as ‘F’ and ‘G’. The gdi scores
calculated from all other analyses suggested maintaining the 17-species
model. In the “tip-down”, iterative approach, gdi scores
supported collapsing a number of tips resulting in a 29-species model
from the original 35 tips after combining morphologically similar
specimens from the same location (Fig. 3; Dryad file S57). Collapsed
specimens included two of the three specimens identified as N.
cornea from the Channel Islands with two specimens from Baja California
(in clade ‘O’), specimens identified as N. fimbriata , N.
palmeri , and N. suffnessii (in clade ‘M’), N.
flagelliforma with N. juncosa var. juncosa (in clade
‘H’), and N. flagelliforma with N. rugosa (in clade ‘B’)
(Fig. 3).
To further characterize the impact of data subsampling on gdi scores, we report average gdi scores across all nodes for each of
the RADseq data subsets. The 10- and 163-locus datasets comprising the
most variable sites had lower gdi scores when compared with the
remaining subsets (Fig. 5). Average gdi values generally
increased with an increased coverage of genetic markers, although using
the most informative markers rather than a random subset seems to limit
this trend. This trend is primarily caused because τ
is much higher when estimated using only the most variable loci, with
the exception of a very small dataset of 10 loci. Estimates for θ of each individual candidate species were also higher when using the
most informative subsets, but so is the standard deviation of the
estimated and θ values.