Material and Methods

Plant material and HTS procedure

Figure 1 shows our experimental design and workflow. Thirty beech individuals from three species (four taxa) were collected in the wild, covering 14 total populations. Four samples collect five individuals each of Fagus sylvatica s.l. from single populations in Germany, Italy (F. sylvatica s.str.), Greece, and Iran (F. orientalis ); in addition, individual DNA extracts from five populations of F. crenata from Japan (‘subgenus Fagus’) and F. japonica (‘subgenus Engleriana’; used as outgroup) were pooled as additional samples (details and map provided in Supplementary file S1, sections 1.2, 2.3).
Species and taxa were morphologically identified (Shen, 1992; Denk 1999a, b). DNA extractions were performed from silica-gel dried leaves with the DNeasy plant minikit (QIAGEN) and quantified with a NanoDrop spectrophotometer (TermoFisher Scientific). We prepared six artificial samples, consisting of pure species/taxa and/or populations, by pooling equal amounts of DNA of every individual to make up a total of 20 ng per sample. As in Piredda et al. (2020), the nuclear ribosomal 5S intergenic spacer (5S-IGS) was amplified with the plant-specific primer pair CGTGTTTGGGCGAGAGTAGT (forward) and CTGCGGAGTTCTGATGG (reverse). Paired-end Illumina sequencing (2×300 bp sequencing) was performed by LGC Genomics GmbH. Raw sequences were deposited in the Sequence Read Archive under BioProject PRJNA681175.

Bioinformatics tools and workflow

Illumina paired-end reads (raw sequence data with adapters and primers clipped) were processed using mothur v.1.33.0 (Schloss et al., 2009). Contigs between read pairs were assembled using the ΔQ parameter (Kozich et al., 2013) for solving the differences in base calls in the overlapping region and no ambiguous bases were allowed. Reads were dereplicated and screened for chimeras using UCHIME in de-novomode (Edgar et al., 2011) within mothur and the unique (non-redundant) sequences (5S-IGS variants) with abundance ≤ 4 were excluded from further analyses.
Given the absence of reference beech 5S rDNA sequences available in public repositories, the sequencing success of the target amplicon (5S-IGS) relied on blast searches (https://www.ncbi.nlm.nih.gov/; accessed on July 15th, 2020) within Fagales, with randomly chosen sequence reads, and reference to known literature (cf. Supplementary file S1, sections 2.2, 5).
Unique 5S-IGS sequence variants, selected based on abundance thresholds or per sample, were aligned using mafft v.7 (Katoh & Standley, 2013); mafft-generated multiple sequence alignments (MSAs) were manually checked and sequence ends trimmed using SeaView v.4.0 (Gouy et al., 2010). Sequence length and GC content percentage were calculated with jemboss 1.5 (Carver & Bleasby, 2003) and plotted with ggplot2 R package (Wickham, 2016). Occurrence and relative abundance of sequence reads in each sample were used for a preliminary identification of the obtained sequences (Supplementary File S1, section 2.3; Supplementary file S2). ”Specific” classes included sequence reads (near-)exclusive (>99.95%) to one taxon/sample; sequences shared among taxonomically different samples were defined ”Ambiguous”. A graphical visualisation of the links between the 5S-IGS variants in each sample and class assignation was performed by Circos plot (Krzywinski et al., 2009).
Variants with abundance ≥ 25 (686-tip set) were used in the phylogenetic analyses (trees and networks). Maximum likelihood (ML) analyses relied on RAxML v.8.2.11 (Stamatakis, 2014). Trees were inferred under a GTR + Γ substitution model using the ‘extended majority-rule consensus’ criterion as bootstopping option (with up to 1000 bootstrap [BS] pseudoreplicates; Pattengale et al., 2009). Trees’ visualization was performed in iTOL (www.itol.embl.de; Letunic & Bork, 2019) or Dendroscope 3 (Huson & Scornavacca, 2013). The Neighbour-Net (NNet) algorithm implemented in SplitsTree4 (Bryant & Moulton, 2004; Huson & Bryant, 2006) was used to generate a planar (equal angle, parameters set to default) phylogenetic network to explore incompatible phylogenetic signals and visualize general diversification patterns (cf. Hipp et al., 2020). The NNet used simple (Hamming) uncorrected pairwise (p- ) distances computed with SplitsTree4. In addition, we assessed intra- and inter-lineage diversity, between and within main 5S-IGS types (see Results ), with MEGA X (Kumar et al., 2018).
The phylogenetic position of variants with low abundance (<25) was evaluated with the ‘Evolutionary Placement Algorithm’ (EPA; Berger et al., 2011; Supplementary files S2, S3) implemented in RAxML. EPA was done by compiling MSAs for each sequence class and using the phylogenetic tree inferred from the 686-tip dataset; its outputs showed the multiple possible placement positions of each query sequence with different likelihood weights (probabilities) in all branches of the tree. Placement results (tree.jplace standard placement format) were visualized using iTOL. All phylogenetic and EPA analysis files, including used MSAs and a MSA collecting all 4693 sequences, are included in the Online data archive (http://dx.doi.org/10.6084/m9.figshare.13793744).
Following all phylogenetic analyses (per-sample trees, 686-tip tree and network, EPA assignation), we selected a set of 38 variants including the most abundant variants belonging to each identified lineage and the most abundant (taxon-)“specific” and “ambiguous” variants found in each sample, and additional strongly divergent and/or rare variants, for a further interpretation of the results. The autogenerated mafft MSA was checked and curated for all length-polymorphic regions using Mesquite v. 2.75 (Maddison and Maddison 2011). Phylogenetic tree inference for the selected data used RAxML, included or excluded two generally length-polymorphic regions (seeResults ), and 10,000 BS pseudo-replicates to establish support for competing splits. Supplementary file S4 documents the selection and includes a fully annotated, tabulated version of the 38-tip alignment.