Figure legends
Figure 1. Sampling sites. Clover breeding trial sites in Rennes
(F), Didbrook (UK), Store Heddinge (DK). Each dot represents 40 samples.
Organic fields sampled in Jutland (DKO1-6). Each dot represents one
sample. The total number of sample sites is 170 (UK=40, F=40, DK=40,
DKO1=14, DKO2=8, DKO3=3, DKO4=15, DKO5=5, DKO6=5).
Figure 2. A -D : Phylogenetic trees of all
alleles found in isolates or amplified from nodules. . MAUI-seq
amplicons (black, bold) for each gene (nomenclature: seq - abundance
rank - primary_UMI_count) and sequences (grey, light) from
representative isolates (nomenclature: strainID - number of alleles in
196 genomes – genospecies) have been included. The scale is in the
number of nucleotide differences. Core genes (rpoB andrecA ) are assigned to the genospecies A-E . Nod genes
(nodA and nodD ) are assigned to a genospecies if possible,
or to a clade of introgressing genes labelled X . If an amplicon could
not clearly be assigned to a clade it is marked as NA. A:rpoB . B: recA . C: nodA .D: nodD . E -H : Relative allele
abundance for individual genes within sites (DK, F, UK, DKO) for the two
different methods. Each point represents an allele that was found in the
isolates and/or the MAUI-seq data. For each location (DK, F, UK, DKO),
the frequency among isolates is plotted against the average frequency of
the same allele in the MAUI-seq samples. In the case of DKO, where the
number of isolates per field varied from 1 to 3, each field was weighted
equally.
Figure 3. Heat map of relative amplicon frequency for
individual genes. Samples with UMI count under 10 for an individual gene
are in grey and are excluded from all analyses. The normalisation is
done for each gene individually. Management regime and grouping of the
samples are indicated to the left of the heat map. Sequences that have
an exact match in the 196 Rlt genomes (196 Rlt ), and the
genospecies clade (GS) of each allele, are indicated by coloured bars
above the heat map. (nSamples_rpoB: 105, nSamples_recA: 153,
nSamples_nodA: 129, nSamples_nodD: 130). Abundance scale is
log10(frequency).
Figure 4 . Correlation between increase in genetic diversity
(F ST) and geographic distances for pairwise
comparisons between DKO samples. p-values are indicated by asterisks;
*p<0.05, **p<0.01, and ***p< 0.001.A-D: Pairwise F ST between DKO clusters.E-F: Correlations between normalised allele frequencies and
soil chemical properties per gene for the DKO subset for a core gene
(rpoB , E ) and an accessory gene (nodD ,F ). The cluster of high correlations including clay and silt
are highlighted in grey. Alleles within the cluster are highlighted in
red.
Figure 5. Genospecies composition of Rlt from nodules.A , C , E , and G : Genospecies
composition of each individual sample for each gene (A :rpoB , C : recA , E : nodA , andG : nodD ). The DKO groupings are labelled by their
respective number (DKO1=1). Core genes (rpoB and recA ) are
assigned to the genospecies A-E . Nod genes (nodA andnodD ) are assigned to a genospecies if possible, or to a clade of
introgressing genes labelled X . If an amplicon could not clearly be
assigned to a clade it is marked as NA. B , D ,F , and H : Genospecies composition based on individual
genes of isolates from DKO fields (n=88), DK (n=36), F (n=40), UK (n=32)
for rpoB , recA , nodA , and nodD , respectively
.
Figure 6 . Nucleotide diversity within populations for each
gene. π for individual samples within the DKO groupings and DK, F, and
UK field trials. Dots illustrate the π value for each individual sample.
Bars represent the first and third quartiles, with the solid line
denoting the median. Whiskers correspond to the 1.5 * interquartile
range. p-values were calculated for each individual gene using ANOVA
followed by Tukey’s post hoc testing. Groupings indicated by the
same letter were not significantly different at p<0.05.