Phylogenomics
Total genomic DNA was extracted from herbarium specimens using a
modified CTAB protocol (Doyle and Doyle, 1987). Nuclear target
enrichment was used to capture 260 low- to single-copy nuclear (LSCN)
genes using RNA baits design for Dioscorea (Soto Gomez et
al., 2019). Genomic libraries were prepared using NEBNext® Ultra™ II
DNA Library Prep Kit for Illumina® (New England Biolabs, Ipswich,
Massachusetts, USA) with AMPure XP magnetic beads and NEBNext® Multiplex
Oligos for Illumina® (Dual Index Primer Sets I and II) as tags for
simultaneous sequencing. Subsequently, the enriched libraries were
multiplexed and sequenced on a HiSeq X platform (Illumina, Inc.) lane.
We filtered raw paired-end reads by removing adapter sequences and
low-quality reads using Trimmomatic v.0.36 (Bolger et al., 2014).
We used HybPiper v.1.3.1 (Johnson et al., 2016) to recover 260
nuclear genes and associated introns (Soto Gomez et al., 2019)
using sequence data from the transcriptome of D. communis (SRA
SAMN11290810) as a reference file following Viruel et al. (2019).
We used nQuire (Weiß et al ., 2018) to calculate the number of
read counts for each allele per single nucleotide polymorphism (SNP),
and estimated the median value of allelic ratios per sample to classify
each individual as diploid (<2) or polyploid (>2)
as described and optimized for Dioscorea in Viruel et al.(2019). The percentage of polymorphic sites was calculated as the
percentage of SNP positions compared to the total number of base pairs
retrieved for each sample. Plastome data were recovered using HybPiper
and the plastome of D. elephantipes as reference (GenBank
NC_009601).
Sequences were aligned with MAFFT v.7 (Katoh, et al., 2002) using
the –auto parameter, and debugged with trimAl v.1.4.1
(Capella-Gutiérrez et al., 2009) using the -automated1command. Phylogenomic trees were reconstructed using the concatenated
nuclear DNA (nDNA) and plastid DNA (pDNA) datasets independently, and
for each nuclear gene independently, using RAxML-NG (Katoh et
al., 2019) and IQ-TREE (Nguyen et al., 2015), with a GTR+GAMMA
substitution model and 1,000 bootstrap replicates. We used ASTRAL‐III
(Zhang et al., 2018) to construct a species tree based on the
independent nuclear gene trees, and SVDquartets to evaluate 10,000,000
random quartets -or all possible quartets if lower- and 10,000 bootstrap
replicates, as implemented in PAUP* 4.0a146 (Swofford, 2002). Haplotype
networks were reconstructed with plastid data using the TCS method as
implemented in Popart v1.7 (Clement et al., 2002; Leigh and
Bryant, 2015).
We used Structure (Pritchard et al., 2000) to further investigate
the genetic clusters within and between taxa based on filtered SNP data
from the concatenated nDNA dataset. We tested one to six genetic groups
(K =1–6) allowing admixture at individual level, and correlated
allele frequencies, by running five replicates with a 100,000 burn-in
and a chain length of 1,000,000 simulations each. Structure Harvester
(Earl and vonHoldt, 2012) was used to obtain likelihood values for the
multiple values of K and to apply the delta K criterion to
select the optimal K . We plotted Structure results for eachK value using StructuRly (Criscuolo and Angelini, 2020).
Divergence times were estimated using a Bayesian relaxed-clock approach
implemented in BEAST 1.10.4 (Drummond and Rambaut, 2007) and a penalized
likelihood approach as implemented in treePL (Smith and O’Meara, 2012)
using the concatenated nDNA dataset containing one representative per
taxon. In both analyses, the crown node of the African/Mediterranean
clade was used as calibration by applying a minimum age of 24 MY and a
maximum age of 40 MY, based on the age estimates from Viruel et
al. (2016; 95% HPD of 24.3469–39.2223 MY). We applied the GTR+I+G
substitution model, Yule tree prior, and an uncorrelated lognormal
molecular clock and ran the analysis for 1 billion generations, sampling
every 100,000 generations. The treePL analysis was conducted in two
consecutive runs: i) applying the “prime” option to select the most
optimal parameter values, and ii) a “thorough” analysis by settingopt = 1, optad = 2 and optcvad = 5.