Limited allele correlation with soil chemical properties
Adaptations to ecological niches require different sets of genes. Genetic differentiation in soil microbial communities is therefore often linked with the chemical and physical composition and pH of the soil. To test whether the correlation between F ST and geographical distance was due to geographically linked differences in soil chemical composition, we tested the correlations between allele frequency and soil traits from the fields where the samples were collected.
Several clusters of strong correlations between allele frequency and soil chemical and physical properties were observed for the full dataset (Figure S4A-D ). The high clay content in the UK field trial site and the unique set of gsB alleles observed in these samples drove this clustering (Table S1, Figure 2 ). Since none of the UK gsB alleles were observed in any other samples, no conclusions can be drawn as to whether the gsB dominance is due to an increased fitness in clay, or to geographical influence.
To test a more homogeneous set of samples, we focused on the DKO subset, which had a broad range of values for all soil chemical properties and no extreme values that could be confounded with rare alleles. Two core gene alleles, rpoBseq2 (correlation with clay = 0.6141, p-value = 5.03e-05; silt: correlation = 0.5877, p-value = 0.0001) and recAseq4, and one common nod allele (nodDseq2) were highly correlated with silt and clay content, (Figure 4E and 4F ,Figure S4 ). The recA allele was very rare, and only occurred in four samples, whereof two had a high silt and clay content, driving the correlation signal. rpoBseq2 and nodDseq2 were both correlated with silt and clay. Both alleles were assigned to genospecies C (Figure S2 ), and had a correlation of 0.525 (p-value = 0.0029). To investigate whether these two alleles co-occur within theRlt isolates, we BLASTed them against the 196 whole genome sequences. The rpoB allele was present in 36 of the genomes, whereas the nodD allele was present in 55 genomes. The alleles co-occurred in 30 strains, most of which were isolated in fields or field trials sites with a high clay/silt content, suggesting the genomic architecture of these strains might confer some increased fitness in clay/silt rich soils. The majority of strong correlations observed were between alleles, meaning some strains tend to co-occur, or between soil chemical properties that are correlated (such as silt and clay) or mutually exclusive (such as coarse sand and fine sand).
Since no alleles or soil chemical properties are highly correlated with latitude or longitude, the F ST correlation with geographical distance (Figure 4 ) does not seem to be driven by differences in soil chemistry or composition.