Limited allele correlation with soil chemical properties
Adaptations to ecological niches require different sets of genes.
Genetic differentiation in soil microbial communities is therefore often
linked with the chemical and physical composition and pH of the soil. To
test whether the correlation between F ST and
geographical distance was due to geographically linked differences in
soil chemical composition, we tested the correlations between allele
frequency and soil traits from the fields where the samples were
collected.
Several clusters of strong correlations between allele frequency and
soil chemical and physical properties were observed for the full dataset
(Figure S4A-D ). The high clay content in the UK field trial
site and the unique set of gsB alleles observed in these samples drove
this clustering (Table S1, Figure 2 ). Since none of the UK gsB
alleles were observed in any other samples, no conclusions can be drawn
as to whether the gsB dominance is due to an increased fitness in clay,
or to geographical influence.
To test a more homogeneous set of samples, we focused on the DKO subset,
which had a broad range of values for all soil chemical properties and
no extreme values that could be confounded with rare alleles. Two core
gene alleles, rpoBseq2 (correlation with clay = 0.6141, p-value =
5.03e-05; silt: correlation = 0.5877, p-value = 0.0001) and recAseq4,
and one common nod allele (nodDseq2) were highly correlated with
silt and clay content, (Figure 4E and 4F ,Figure S4 ). The recA allele was very rare, and only
occurred in four samples, whereof two had a high silt and clay content,
driving the correlation signal. rpoBseq2 and nodDseq2 were both
correlated with silt and clay. Both alleles were assigned to genospecies
C (Figure S2 ), and had a correlation of 0.525 (p-value =
0.0029). To investigate whether these two alleles co-occur within theRlt isolates, we BLASTed them against the 196 whole genome
sequences. The rpoB allele was present in 36 of the genomes,
whereas the nodD allele was present in 55 genomes. The alleles
co-occurred in 30 strains, most of which were isolated in fields or
field trials sites with a high clay/silt content, suggesting the genomic
architecture of these strains might confer some increased fitness in
clay/silt rich soils. The majority of strong correlations observed were
between alleles, meaning some strains tend to co-occur, or between soil
chemical properties that are correlated (such as silt and clay) or
mutually exclusive (such as coarse sand and fine sand).
Since no alleles or soil chemical properties are highly correlated with
latitude or longitude, the F ST correlation with
geographical distance (Figure 4 ) does not seem to be driven by
differences in soil chemistry or composition.