Results

Reads pre-processing and removal of unique variants with abundance < 4 yielded 145,643 sequences, corresponding to 4,693 representative 5S-IGS variants (work details, statistics and sequence structural features reported in Supplementary files S1, S2). Of these, 686 had an abundance of ≥ 25 and were included in the phylogenetic analyses; this left 4007 variants to be placed using EPA (Supplementary files S2, S3). Analyses of 38 selected sequences further clarified the evolutionary relationships among the detected 5S-IGS lineages (Supplementary file S4).

686-tip 5S-IGS backbone phylogeny

We first (Fig. 1) categorized the obtained 4,693 5S-IGS sequences based on their sample distribution (Fig. 2). Variants were labelled as “specific” when exclusively found in one (or two, in case of F. sylvatica s.str.) of the samples representing the same taxon: “japonica”, “crenata”, “Iranian orientalis”, “Greek orientalis”, and “sylvatica”. In addition, we identified four “ambiguous” classes, i.e. sequences shared among different species or taxa. Five main 5S-IGS lineages were defined based on the phylogenetic analysis of 686 sequences with abundance ≥ 25 (Fig. 3). The two ingroup 5S-IGS main types, labelled A- and B-type, form distinct clades (BS = 95/47); they were established prior to speciation within the F. crenata – F. sylvatica s.l. lineage (in short: crenata-sylvatica lineage). The ML tree and NNet network (Fig. 4) highlight the bimodality of the five “specific” sequence classes (‘japonica’, ‘crenata’, ‘sylvatica’, ‘Greek orientalis’ and ‘Iranian orientalis’). Most shared variants classified as “ambiguous” are part of the ‘European A’-type lineage (BS = 34); while Iranian A-type variants are exclusively found in the sister clade (‘Original A’; BS = 53, BS = 29 when including ‘Crenata A’ variant). The lower support for the B-root (Fig. 3) relates to the higher diversity in the B-types (Fig. 4; Table 1), which comprise two genetically coherent clades (‘European B’ and ‘Crenata B2’) in addition to a poorly sorted, ambiguously supported clade (‘Original B’). The ancestor of the crenata-sylvatica lineage must have been polymorphic and the modern pattern the result of incomplete lineage sorting. The Iranian F. orientalis individuals represent a now genetically isolated sub-sample of the original variation found in the western range of the crenata-sylvatica lineage; in contrast, the Greek F. orientalis + F. sylvatica s.str. and F. crenataare better sorted.
There is no evidence for ongoing lineage mixing in Japan betweenF. japonica (‘subgenus Engleriana’) and F. crenata(‘subgenus Fagus’). One of the F. japonica types, ‘Japonica I’, is substantially more similar to both ingroup main types (A- and B-type) than the other dominant type (‘Japonica O’; Fig. 4). One “ambiguous” variant shared by western Eurasian beeches (‘European O’) is part of the O-type lineage; a small clade of F. japonica- exclusive variants (‘Japonica X’) nests between the A-B-I clade and the O-clade.
Fagus crenata shares a relative recent common origin with the western Eurasian beeches, represented by the ‘Original B’-type (BS = 71) with two subgroups. The ‘Western Eurasian B’ subclade (BS = 54; including one F. crenata B-variant: ‘Crenata B0’) is poorly sorted; potentially ancient variants closest to ‘Crenata B1’ sequences (grade in Fig. 3; neighbourhood in Fig. 4) have persisted in Italian, Greek and Iranian populations. In contrast to its western relatives,Fagus crenata lacks non-degraded type A variants (Table 2; Supplementary file S1, section 4.2) with most variants falling within a highly supported, F. crenata -exclusive B-type clade (‘Crenata B2’; BS = 98); this F. crenata -specific lineage also includes the ‘Cross-Asia’ variant found as a singleton in Iranian F. orientalis . The remainder (‘Crenata B3’) is placed between the I-lineage and the core group of the B-lineage, together with sequentially highly derived Iranian B-type variants (‘Iranian B1’; Fig. 2). One sequence (‘Crenata A’) is placed within the ‘Original A’ clade, as sister to all other variants (Fig. 3), which may be a tree-branching artefact (Fig. 4).
The only other subtype showing the same level of genetic coherence (Fig.4), is the ‘European B’-type (BS = 93), shared exclusively byF. sylvatica s.str. and Greek F. orientalis (Table 2). Given its distinctness, the ‘European type B’ likely reflects most-recent sorting and speciation events that involved F. sylvatica and western (Greek) F. orientalis but not their eastern (Iranian) relatives. In general, the western Eurasian beeches form a genetic continuum characterized by several, partially incomplete sorting events (‘Iranian A’ vs. ‘European A’; ‘Original B’ vs. ‘European B’). Within the continuum, Iranian F. orientalis appears to be most isolated and ancient/ancestral with respect to F. crenataand F. japonica.
Very short ‘Orientalis’ variants are deeply embedded within the ‘European A’ and ‘European B’ clades, while short ‘japonica’ variants are all but one O-type (Supplementary file S1, section 4.1). The distance-based NNet placed all short variants next to the centre of the graph. Thus, they are sequentially undiagnostic lacking more than 100 bp from the 5’ or central part of the spacer but also inconspicuous within the larger ingroup (‘Japonica I’, A- and B-types).

Framework phylogeny

The ML trees for the 38 selected sequences (representing the most abundant variants within each sample, strongly deviating variants, and variants shared across species) highlight the deep split between outgroup (type O) and ingroup 5S-IGS variants (I-, A- and B-types; Fig. 5). The mid-point root corresponds to the split between type O and the A-B-I clade. Tip-pruning as well as the elimination of the indiscriminative ‘Japonica X’ lineage led to a substantial increase in backbone branch support: the divergence between F. japonicaingroup variants (type I) and B-type(s) is placed after the isolation of the A-type lineage (BS = 99/84). A group of sequentially unique, rare shared variants showing few to substantial signs of sequence degradation forms a distinct clade. This “relict lineage” is placed between the outgroup subtree (comprising the ‘Japonica O’-type and sequence-degraded “ambiguous” western Eurasian variants) and the A-B-I subtree. In-depth sequence structure analysis shows that this placement is only partly due to potential ingroup-outgroup long-branch attraction. While both subtrees (outgroup and ‘relict’ type) include variants most different from the ingroup consensus, and branch support is higher when length-polymorphic regions are included, the rare shared types also have an increased number of mutational patterns, which appear to be primitive within the entire A-B-I lineage. Thus, they may represent relict variants; ancestral copies still found in 5S rDNA arrays that have been subsequently replaced and eliminated within the crenata-sylvaticalineage (Table 2). Another important observation is that the long-branching A- and B-type variants found exclusively in the Iranian sample constitute strongly divergent, genetically coherent lineages as well as two of the low-abundant, high-divergent F. crenata- unique variants. While some rare, shared ingroup variants (‘relict lineage’, ‘Crenata A’, some ‘European O’) show clear signs of sequence degradation, others are inconspicuous (no pseudogenous mutations in flanking 5S rDNA) and can be highly similar to the most abundant variants.

Bimodality and differences between 5S-IGS populations in each sample

The unrooted, single-sample ML trees resolve two 5S rDNA clusters (5S main types) with high bootstrap values (71–100) in all studied samples (Fig. 6). The main splits reflect the two most common main types: I- and O-type in the outgroup F. japonica ; and A- and B-type in F. sylvatica s.l. The difference between the two clusters is most pronounced in F. japonica; the least intra-species (intra-sample) divergence is exhibited by F. crenata, which largely lacks A-type 5S-IGS. Phylogenetically intermediate variants characterize the western Eurasian samples (long-branched in Iranian F. orientalis ); strongly modified variants with little affinity to either 5S rDNA cluster, hence, connected to the centre of the graph, are abundant inF. crenata (two lineages, one representing a relict ‘Crenata A’-type; see Fig. 4) and F. sylvatica s.str. (a single lineage in each population). These intermediate sequences do not show any structural peculiarity, except for the ‘Crenata A’-type variant showing a reduced GC content (34.3%; Supplementary files S1, section 4.2 and appendix B). The outgroup-type ‘European O’ variants represent the longest branched sequences in the Greek F. orientalis (Greece) and both F. sylvatica s.str. samples. As a trend, the variants within the B-type and I-type 5S rDNA subtrees show a higher divergence and phylogenetic structure than found within the A- and O-type subtrees. For instance, in Greek F. orientalis and F. sylvaticas.str., the ‘B type’ subtree includes a distinct, highly supported clade.
Using EPA (Evolutionary Placement Algorithm), we assessed the phylogenetic affinity of all variants with an abundance ≥ 4 not included in the 686-tip matrix (Supplementary files S2, S3). In general, 5S-IGS arrays have a high capacity to conserve signal from past reticulations and deep divergences: types placed by EPA on the branch of the ‘European O’-type variant, representing a distinct but degrading and rare sister lineage of the F. japonica type O (cf. Table 2; Figs 3, 4), can be found in all samples of the crenata-sylvatica lineage (Table 3; Supplementary file S1, appendix A). In contrast, Type X variants are exclusive to F. japonica . The single, low-abudant type I variant identified by EPA in the F. sylvatica s.str. sample from Germany represents a relict variant from the initial radiation within the (A-)I-B lineage. The ‘relict lineage’, phylogenetically in-between O- and A-/B-/I-types (cf. Fig. 5), is represented in all samples of thecrenata-sylvatica lineage as well. Despite showing signs of sequence degradation in the flanking 5S rRNA gene regions, its GC-contents (35.1–40.0%) largely range between the median values of type O and type B/I. Hence, they match the range of A-types, and are also of the same length than most A- and B-types (Supplementary file S1, appendix B, includes cloud and violin plots for all types per sample).
Compilation of length diversity and GC content per main (most frequent) type and sample (Fig. 7) shows that the generally longer A-types have lower GC content than the B-type in each species; the I-B clade corresponds to a similar GC content and sequence length in the F. japonica type I and crenata-sylvatica lineage type B. Highest GC contents are found in F. japonica type O, which also represents the longest 5S-IGS type. While GC-richer B-type variants are more frequent, the GC-poorer A-type variants make up a higher portion of the HTS reads corresponding to rare sequence variants. In F. crenata,A-type 5S-IGS variants are nearly absent while in Iranian F. orientalis they show the highest diversity: 386 species-unique variants (40% of all A-type variants) with a total abundance of 8466 (34%). The opposite can be observed for the type B variants: while they are ± equally abundant in the Iranian F. orientalis sample, they are 1.5 to 2-times more frequent in Greek F. orientalis and ItalianF. sylvatica samples, and approach the largest majority in the German F. sylvatica sample. Apparently, the GC-richer B-types subsequently replaced A-types in the genomes of thecrenata-sylvatica lineage, while in F. japonica their sister lineage, type I variants, outnumbered the GC-richer, sequentially more complex O-types.