Material and Methods
Plant material and HTS
procedure
Figure 1 shows our experimental design and workflow. Thirty beech
individuals from three species (four taxa) were collected in the wild,
covering 14 total populations. Four samples collect five individuals
each of Fagus sylvatica s.l. from single populations in Germany,
Italy (F. sylvatica s.str.), Greece, and Iran (F.
orientalis ); in addition, individual DNA extracts from five populations
of F. crenata from Japan (‘subgenus Fagus’) and F.
japonica (‘subgenus Engleriana’; used as outgroup) were pooled as
additional samples (details and map provided in Supplementary file S1,
sections 1.2, 2.3).
Species and taxa were morphologically identified (Shen, 1992; Denk
1999a, b). DNA extractions were performed from silica-gel dried leaves
with the DNeasy plant minikit (QIAGEN) and quantified with a NanoDrop
spectrophotometer (TermoFisher Scientific). We prepared six artificial
samples, consisting of pure species/taxa and/or populations, by pooling
equal amounts of DNA of every individual to make up a total of 20 ng per
sample. As in Piredda et al. (2020), the nuclear ribosomal 5S intergenic
spacer (5S-IGS) was amplified with the plant-specific primer pair
CGTGTTTGGGCGAGAGTAGT (forward) and CTGCGGAGTTCTGATGG (reverse).
Paired-end Illumina sequencing (2×300 bp sequencing) was performed by
LGC Genomics GmbH. Raw sequences were deposited in the Sequence Read
Archive under BioProject PRJNA681175.
Bioinformatics tools and
workflow
Illumina paired-end reads (raw sequence data with adapters and primers
clipped) were processed using mothur v.1.33.0 (Schloss et al.,
2009). Contigs between read pairs were assembled using the ΔQ parameter
(Kozich et al., 2013) for solving the differences in base calls in the
overlapping region and no ambiguous bases were allowed. Reads were
dereplicated and screened for chimeras using UCHIME in de-novomode (Edgar et al., 2011) within mothur and the unique
(non-redundant) sequences (5S-IGS variants) with abundance ≤ 4 were
excluded from further analyses.
Given the absence of reference beech 5S rDNA sequences available in
public repositories, the sequencing success of the target amplicon
(5S-IGS) relied on blast searches
(https://www.ncbi.nlm.nih.gov/; accessed on July 15th, 2020) within
Fagales, with randomly chosen sequence reads, and reference to known
literature (cf. Supplementary file S1, sections 2.2, 5).
Unique 5S-IGS sequence variants, selected based on abundance thresholds
or per sample, were aligned using mafft v.7 (Katoh & Standley,
2013); mafft-generated multiple sequence alignments (MSAs) were
manually checked and sequence ends trimmed using SeaView v.4.0
(Gouy et al., 2010). Sequence length and GC content percentage were
calculated with jemboss 1.5 (Carver & Bleasby, 2003) and
plotted with ggplot2 R package (Wickham, 2016). Occurrence and
relative abundance of sequence reads in each sample were used for a
preliminary identification of the obtained sequences (Supplementary File
S1, section 2.3; Supplementary file S2). ”Specific” classes included
sequence reads (near-)exclusive (>99.95%) to one
taxon/sample; sequences shared among taxonomically different samples
were defined ”Ambiguous”. A graphical visualisation of the links between
the 5S-IGS variants in each sample and class assignation was performed
by Circos plot (Krzywinski et al., 2009).
Variants with abundance ≥ 25 (686-tip set) were used in the phylogenetic
analyses (trees and networks). Maximum likelihood (ML) analyses relied
on RAxML v.8.2.11 (Stamatakis, 2014). Trees were inferred under
a GTR + Γ substitution model using the ‘extended majority-rule
consensus’ criterion as bootstopping option (with up to 1000 bootstrap
[BS] pseudoreplicates; Pattengale et al., 2009). Trees’
visualization was performed in iTOL
(www.itol.embl.de; Letunic &
Bork, 2019) or Dendroscope 3 (Huson & Scornavacca, 2013). The
Neighbour-Net (NNet) algorithm implemented in SplitsTree4
(Bryant & Moulton, 2004; Huson & Bryant, 2006) was used to generate a
planar (equal angle, parameters set to default) phylogenetic network to
explore incompatible phylogenetic signals and visualize general
diversification patterns (cf. Hipp et al., 2020). The NNet used simple
(Hamming) uncorrected pairwise (p- ) distances computed with
SplitsTree4. In addition, we assessed intra- and inter-lineage
diversity, between and within main 5S-IGS types (see Results ),
with MEGA X (Kumar et al., 2018).
The phylogenetic position of variants with low abundance (<25)
was evaluated with the ‘Evolutionary Placement Algorithm’ (EPA; Berger
et al., 2011; Supplementary files S2, S3) implemented in RAxML. EPA was
done by compiling MSAs for each sequence class and using the
phylogenetic tree inferred from the 686-tip dataset; its outputs showed
the multiple possible placement positions of each query sequence with
different likelihood weights (probabilities) in all branches of the
tree. Placement results (tree.jplace standard placement format) were
visualized using iTOL. All phylogenetic and EPA analysis files,
including used MSAs and a MSA collecting all 4693 sequences, are
included in the Online data archive
(http://dx.doi.org/10.6084/m9.figshare.13793744).
Following all phylogenetic analyses (per-sample trees, 686-tip tree and
network, EPA assignation), we selected a set of 38 variants including
the most abundant variants belonging to each identified lineage and the
most abundant (taxon-)“specific” and “ambiguous” variants found in
each sample, and additional strongly divergent and/or rare variants, for
a further interpretation of the results. The autogenerated
mafft MSA was checked and curated for all length-polymorphic
regions using Mesquite v. 2.75 (Maddison and Maddison 2011).
Phylogenetic tree inference for the selected data used RAxML, included
or excluded two generally length-polymorphic regions (seeResults ), and 10,000 BS pseudo-replicates to establish support
for competing splits. Supplementary file S4 documents the selection and
includes a fully annotated, tabulated version of the 38-tip alignment.