Figure legends
Figure 1. Sampling sites. Clover breeding trial sites in Rennes (F), Didbrook (UK), Store Heddinge (DK). Each dot represents 40 samples. Organic fields sampled in Jutland (DKO1-6). Each dot represents one sample. The total number of sample sites is 170 (UK=40, F=40, DK=40, DKO1=14, DKO2=8, DKO3=3, DKO4=15, DKO5=5, DKO6=5).
Figure 2. A -D : Phylogenetic trees of all alleles found in isolates or amplified from nodules. . MAUI-seq amplicons (black, bold) for each gene (nomenclature: seq - abundance rank - primary_UMI_count) and sequences (grey, light) from representative isolates (nomenclature: strainID - number of alleles in 196 genomes – genospecies) have been included. The scale is in the number of nucleotide differences. Core genes (rpoB andrecA ) are assigned to the genospecies A-E . Nod genes (nodA and nodD ) are assigned to a genospecies if possible, or to a clade of introgressing genes labelled X . If an amplicon could not clearly be assigned to a clade it is marked as NA. A:rpoB . B: recA . C: nodA .D: nodD . E -H : Relative allele abundance for individual genes within sites (DK, F, UK, DKO) for the two different methods. Each point represents an allele that was found in the isolates and/or the MAUI-seq data. For each location (DK, F, UK, DKO), the frequency among isolates is plotted against the average frequency of the same allele in the MAUI-seq samples. In the case of DKO, where the number of isolates per field varied from 1 to 3, each field was weighted equally.
Figure 3. Heat map of relative amplicon frequency for individual genes. Samples with UMI count under 10 for an individual gene are in grey and are excluded from all analyses. The normalisation is done for each gene individually. Management regime and grouping of the samples are indicated to the left of the heat map. Sequences that have an exact match in the 196 Rlt genomes (196 Rlt ), and the genospecies clade (GS) of each allele, are indicated by coloured bars above the heat map. (nSamples_rpoB: 105, nSamples_recA: 153, nSamples_nodA: 129, nSamples_nodD: 130). Abundance scale is log10(frequency).
Figure 4 . Correlation between increase in genetic diversity (F ST) and geographic distances for pairwise comparisons between DKO samples. p-values are indicated by asterisks; *p<0.05, **p<0.01, and ***p< 0.001.A-D: Pairwise F ST between DKO clusters.E-F: Correlations between normalised allele frequencies and soil chemical properties per gene for the DKO subset for a core gene (rpoB , E ) and an accessory gene (nodD ,F ). The cluster of high correlations including clay and silt are highlighted in grey. Alleles within the cluster are highlighted in red.
Figure 5. Genospecies composition of Rlt from nodules.A , C , E , and G : Genospecies composition of each individual sample for each gene (A :rpoB , C : recA , E : nodA , andG : nodD ). The DKO groupings are labelled by their respective number (DKO1=1). Core genes (rpoB and recA ) are assigned to the genospecies A-E . Nod genes (nodA andnodD ) are assigned to a genospecies if possible, or to a clade of introgressing genes labelled X . If an amplicon could not clearly be assigned to a clade it is marked as NA. B , D ,F , and H : Genospecies composition based on individual genes of isolates from DKO fields (n=88), DK (n=36), F (n=40), UK (n=32) for rpoB , recA , nodA , and nodD , respectively .
Figure 6 . Nucleotide diversity within populations for each gene. π for individual samples within the DKO groupings and DK, F, and UK field trials. Dots illustrate the π value for each individual sample. Bars represent the first and third quartiles, with the solid line denoting the median. Whiskers correspond to the 1.5 * interquartile range. p-values were calculated for each individual gene using ANOVA followed by Tukey’s post hoc testing. Groupings indicated by the same letter were not significantly different at p<0.05.