Results
Reads pre-processing and removal of unique variants with abundance
< 4 yielded 145,643 sequences, corresponding to 4,693
representative 5S-IGS variants (work details, statistics and sequence
structural features reported in Supplementary files S1, S2). Of these,
686 had an abundance of ≥ 25 and were included in the phylogenetic
analyses; this left 4007 variants to be placed using EPA (Supplementary
files S2, S3). Analyses of 38 selected sequences further clarified the
evolutionary relationships among the detected 5S-IGS lineages
(Supplementary file S4).
686-tip 5S-IGS backbone
phylogeny
We first (Fig. 1) categorized the obtained 4,693 5S-IGS sequences based
on their sample distribution (Fig. 2). Variants were labelled as
“specific” when exclusively found in one (or two, in case of F.
sylvatica s.str.) of the samples representing the same taxon:
“japonica”, “crenata”, “Iranian orientalis”, “Greek orientalis”,
and “sylvatica”. In addition, we identified four “ambiguous”
classes, i.e. sequences shared among different species or taxa. Five
main 5S-IGS lineages were defined based on the phylogenetic analysis of
686 sequences with abundance ≥ 25 (Fig. 3). The two ingroup 5S-IGS main
types, labelled A- and B-type, form distinct clades (BS = 95/47); they
were established prior to speciation within the F. crenata – F.
sylvatica s.l. lineage (in short: crenata-sylvatica lineage).
The ML tree and NNet network (Fig. 4) highlight the bimodality of the
five “specific” sequence classes (‘japonica’, ‘crenata’, ‘sylvatica’,
‘Greek orientalis’ and ‘Iranian orientalis’). Most shared variants
classified as “ambiguous” are part of the ‘European A’-type lineage
(BS = 34); while Iranian A-type variants are exclusively found in the
sister clade (‘Original A’; BS = 53, BS = 29 when including ‘Crenata A’
variant). The lower support for the B-root (Fig. 3) relates to the
higher diversity in the B-types (Fig. 4; Table 1), which comprise two
genetically coherent clades (‘European B’ and ‘Crenata B2’) in addition
to a poorly sorted, ambiguously supported clade (‘Original B’). The
ancestor of the crenata-sylvatica lineage must have been
polymorphic and the modern pattern the result of incomplete lineage
sorting. The Iranian F. orientalis individuals represent a now
genetically isolated sub-sample of the original variation found in the
western range of the crenata-sylvatica lineage; in contrast, the
Greek F. orientalis + F. sylvatica s.str. and F. crenataare better sorted.
There is no evidence for ongoing lineage mixing in Japan betweenF. japonica (‘subgenus Engleriana’) and F. crenata(‘subgenus Fagus’). One of the F. japonica types, ‘Japonica I’,
is substantially more similar to both ingroup main types (A- and B-type)
than the other dominant type (‘Japonica O’; Fig. 4). One “ambiguous”
variant shared by western Eurasian beeches (‘European O’) is part of the
O-type lineage; a small clade of F. japonica- exclusive variants
(‘Japonica X’) nests between the A-B-I clade and the O-clade.
Fagus crenata shares a relative recent common origin with the
western Eurasian beeches, represented by the ‘Original B’-type (BS = 71)
with two subgroups. The ‘Western Eurasian B’ subclade (BS = 54;
including one F. crenata B-variant: ‘Crenata B0’) is poorly
sorted; potentially ancient variants closest to ‘Crenata B1’ sequences
(grade in Fig. 3; neighbourhood in Fig. 4) have persisted in Italian,
Greek and Iranian populations. In contrast to its western relatives,Fagus crenata lacks non-degraded type A variants (Table 2;
Supplementary file S1, section 4.2) with most variants falling within a
highly supported, F. crenata -exclusive B-type clade (‘Crenata
B2’; BS = 98); this F. crenata -specific lineage also includes the
‘Cross-Asia’ variant found as a singleton in Iranian F.
orientalis . The remainder (‘Crenata B3’) is placed between the
I-lineage and the core group of the B-lineage, together with
sequentially highly derived Iranian B-type variants (‘Iranian B1’; Fig.
2). One sequence (‘Crenata A’) is placed within the ‘Original A’ clade,
as sister to all other variants (Fig. 3), which may be a tree-branching
artefact (Fig. 4).
The only other subtype showing the same level of genetic coherence
(Fig.4), is the ‘European B’-type (BS = 93), shared exclusively byF. sylvatica s.str. and Greek F. orientalis (Table 2).
Given its distinctness, the ‘European type B’ likely reflects
most-recent sorting and speciation events that involved F.
sylvatica and western (Greek) F. orientalis but not their
eastern (Iranian) relatives. In general, the western Eurasian beeches
form a genetic continuum characterized by several, partially incomplete
sorting events (‘Iranian A’ vs. ‘European A’; ‘Original B’ vs. ‘European
B’). Within the continuum, Iranian F. orientalis appears to be
most isolated and ancient/ancestral with respect to F. crenataand F. japonica.
Very short ‘Orientalis’ variants are deeply embedded within the
‘European A’ and ‘European B’ clades, while short ‘japonica’ variants
are all but one O-type (Supplementary file S1, section 4.1). The
distance-based NNet placed all short variants next to the centre of the
graph. Thus, they are sequentially undiagnostic lacking more than 100 bp
from the 5’ or central part of the spacer but also inconspicuous within
the larger ingroup (‘Japonica I’, A- and B-types).
Framework phylogeny
The ML trees for the 38 selected sequences (representing the most
abundant variants within each sample, strongly deviating variants, and
variants shared across species) highlight the deep split between
outgroup (type O) and ingroup 5S-IGS variants (I-, A- and B-types; Fig.
5). The mid-point root corresponds to the split between type O and the
A-B-I clade. Tip-pruning as well as the elimination of the
indiscriminative ‘Japonica X’ lineage led to a substantial increase in
backbone branch support: the divergence between F. japonicaingroup variants (type I) and B-type(s) is placed after the isolation of
the A-type lineage (BS = 99/84). A group of sequentially unique, rare
shared variants showing few to substantial signs of sequence degradation
forms a distinct clade. This “relict lineage” is placed between the
outgroup subtree (comprising the ‘Japonica O’-type and sequence-degraded
“ambiguous” western Eurasian variants) and the A-B-I subtree. In-depth
sequence structure analysis shows that this placement is only partly due
to potential ingroup-outgroup long-branch attraction. While both
subtrees (outgroup and ‘relict’ type) include variants most different
from the ingroup consensus, and branch support is higher when
length-polymorphic regions are included, the rare shared types also have
an increased number of mutational patterns, which appear to be primitive
within the entire A-B-I lineage. Thus, they may represent relict
variants; ancestral copies still found in 5S rDNA arrays that have been
subsequently replaced and eliminated within the crenata-sylvaticalineage (Table 2). Another important observation is that the
long-branching A- and B-type variants found exclusively in the Iranian
sample constitute strongly divergent, genetically coherent lineages as
well as two of the low-abundant, high-divergent F. crenata- unique
variants. While some rare, shared ingroup variants (‘relict lineage’,
‘Crenata A’, some ‘European O’) show clear signs of sequence
degradation, others are inconspicuous (no pseudogenous mutations in
flanking 5S rDNA) and can be highly similar to the most abundant
variants.
Bimodality and differences between 5S-IGS populations in
each
sample
The unrooted, single-sample ML trees resolve two 5S rDNA clusters (5S
main types) with high bootstrap values (71–100) in all studied samples
(Fig. 6). The main splits reflect the two most common main types: I- and
O-type in the outgroup F. japonica ; and A- and B-type in F.
sylvatica s.l. The difference between the two clusters is most
pronounced in F. japonica; the least intra-species (intra-sample)
divergence is exhibited by F. crenata, which largely lacks A-type
5S-IGS. Phylogenetically intermediate variants characterize the western
Eurasian samples (long-branched in Iranian F. orientalis );
strongly modified variants with little affinity to either 5S rDNA
cluster, hence, connected to the centre of the graph, are abundant inF. crenata (two lineages, one representing a relict ‘Crenata
A’-type; see Fig. 4) and F. sylvatica s.str. (a single lineage in
each population). These intermediate sequences do not show any
structural peculiarity, except for the ‘Crenata A’-type variant showing
a reduced GC content (34.3%; Supplementary files S1, section 4.2 and
appendix B). The outgroup-type ‘European O’ variants represent the
longest branched sequences in the Greek F. orientalis (Greece)
and both F. sylvatica s.str. samples. As a trend, the variants
within the B-type and I-type 5S rDNA subtrees show a higher divergence
and phylogenetic structure than found within the A- and O-type subtrees.
For instance, in Greek F. orientalis and F. sylvaticas.str., the ‘B type’ subtree includes a distinct, highly supported
clade.
Using EPA (Evolutionary Placement Algorithm), we assessed the
phylogenetic affinity of all variants with an abundance ≥ 4 not included
in the 686-tip matrix (Supplementary files S2, S3). In general, 5S-IGS
arrays have a high capacity to conserve signal from past reticulations
and deep divergences: types placed by EPA on the branch of the ‘European
O’-type variant, representing a distinct but degrading and rare sister
lineage of the F. japonica type O (cf. Table 2; Figs 3, 4), can
be found in all samples of the crenata-sylvatica lineage (Table
3; Supplementary file S1, appendix A). In contrast, Type X variants are
exclusive to F. japonica . The single, low-abudant type I variant
identified by EPA in the F. sylvatica s.str. sample from Germany
represents a relict variant from the initial radiation within the
(A-)I-B lineage. The ‘relict lineage’, phylogenetically in-between O-
and A-/B-/I-types (cf. Fig. 5), is represented in all samples of thecrenata-sylvatica lineage as well. Despite showing signs of
sequence degradation in the flanking 5S rRNA gene regions, its
GC-contents (35.1–40.0%) largely range between the median values of
type O and type B/I. Hence, they match the range of A-types, and are
also of the same length than most A- and B-types (Supplementary file S1,
appendix B, includes cloud and violin plots for all types per sample).
Compilation of length diversity and GC content per main (most frequent)
type and sample (Fig. 7) shows that the generally longer A-types have
lower GC content than the B-type in each species; the I-B clade
corresponds to a similar GC content and sequence length in the F.
japonica type I and crenata-sylvatica lineage type B. Highest GC
contents are found in F. japonica type O, which also represents
the longest 5S-IGS type. While GC-richer B-type variants are more
frequent, the GC-poorer A-type variants make up a higher portion of the
HTS reads corresponding to rare sequence variants. In F. crenata,A-type 5S-IGS variants are nearly absent while in Iranian F.
orientalis they show the highest diversity: 386 species-unique variants
(40% of all A-type variants) with a total abundance of 8466 (34%). The
opposite can be observed for the type B variants: while they are ±
equally abundant in the Iranian F. orientalis sample, they are
1.5 to 2-times more frequent in Greek F. orientalis and ItalianF. sylvatica samples, and approach the largest majority in the
German F. sylvatica sample. Apparently, the GC-richer B-types
subsequently replaced A-types in the genomes of thecrenata-sylvatica lineage, while in F. japonica their
sister lineage, type I variants, outnumbered the GC-richer, sequentially
more complex O-types.