Phylogenetic inference
We used both coalescent and concatenation-based methods to reconstruct
the phylogenetic trees. We first estimated individual gene trees using
IQ-TREE v1.6.12 (Minh et al., 2020 ); ModelFinder
(Kalyaanamoorthy et al., 2017 ) implemented in IQ-TREE enables a
free-rate variation model for each partition alignment. The individual
gene trees inferred by IQ-TREE were used as input in ASTRAL-III
(Zhang et al., 2018 ) with default parameters to show the local
posterior probabilities (LPPs).
A concatenated phylogeny was inferred
using maximum likelihood (ML) and Bayesian inference (BI) analyses. ML
analysis was performed using the CIPRES Science Gateway v3.3
(https://www.phylo.org/portal2) (Miller et al., 2010 ) and RAxML
v8.1.11 (Stamatakis et al., 2008 ). One thousand rapid bootstrap
iterations were used, and the default settings were used for the other
parameters. BI analysis was carried out using MrBayes v3.2.3
(Ronquist & Huelsenbeck, 2003 ), and the posterior probability
was estimated using four chains running 5,000,000 generations sampling
every 1,000 generations. Convergence of the MCMC chains was assumed when
the average standard deviation of split frequencies reached 0.01 or
less, and the first 25% of the sampled trees were considered burn-in
trees.
To investigate the gene tree expectations under coalescence, which can
be used to gain insight into whether topological features are
attributable to ILS (incomplete lineage sorting) or hybridization
(reviewed in Folk et al., 2018 ), we used a previously described
simulation pipeline (Folk et al., 2017 ; GarcĂa et al.
2017 ; https://github.com/ryanafolk/tree_utilities). Briefly, assuming
a species tree with estimated branch length measured in coalescent
units, 1000 gene tree histories were simulated and clade probabilities
were calculated for all observed relationships in the species tree,
where p ~ 0 indicates a relationship not expected
under ILS alone. Where the ILS is high, many relationships could be low
in probability, so two further probabilistic tests were implemented.
First, a significance test was conducted based on comparing the complete
set of all pairwise Robinson-Foulds distances (1) among simulated gene
trees and (2) between simulated and empirical gene trees, where a
significantly higher empirical distance would suggest discord not
predicted by the ILS, indicative of potential hybridization. Second,
clade probabilities enumerated from the simulated gene tree set were
compared to clade probabilities in the empirical gene set. Similarly,
significantly lower clade probabilities in the empirical gene set are
suggestive of potential hybridization. Both analyses were implemented as
one-tailed t -tests.