Geographically distinct sites display a site-specific set of
nodule Rhizobium alleles
Rhizobium leguminosarum is a species complex consisting of
multiple genospecies that have been shown to co-exist in a field setting
. Rlt core genes show little sign of introgression between
genospecies, and phylogenies of individual core genes therefore most
often follow the overall genospecies phylogenetic tree . A phylogenetic
analysis of amplicons from the chromosomal core genes rpoB andrecA showed that the sampled bacteria from nodules are
distributed throughout the five main genospecies clades previously
identified from isolates originating from these exact fields
(Figure 2A -D ). For the core genes rpoB andrecA , the majority of the alleles identified by MAUI-seq were
also recovered in the isolates, while some additional alleles were found
only in a small number of isolates, particularly for recA(Table 1 and Figure 2A -D ). Of these
sequences, most were actually present in the MAUI-seq dataset, but were
under the cumulative abundance threshold that we used. For the other
three genes, MAUI-seq recovered more alleles than the isolates.
The accessory genes nodA and nodD belong to a group of
co-located genes, known as the sym gene cluster, that are essential for
initiating and maintaining an effective symbiotic relationship with
legumes. The phylogeny of the accessory gene pool has previously been
shown to often be incongruent with the core genes . This cluster is
usually located on a conjugative plasmid in the Rl species
complex . Occasionally, regions of the cluster are duplicated in the
rhizobial genome and, due to the promiscuous nature of conjugative
plasmids, they can cross genospecies boundaries . Using the set of 196
characterised Rlt isolates from the same sampling sites, we
evaluated the level of duplication of nod genes to remove
potential paralogs. In addition to the full nod gene region
(nodXNMLEFDABCIJ ), a partial set of nod genes (nodDABCIJT )
is present in some of the Rlt isolates. nodAseq7 and nodDseq9
occurred only as secondary sequences in this partial nod region and were
designated as nodAa and nodDa , respectively
(Figure 2C-D ). A third type of nodD (nodD2 ) was
observed in some genomes flanked by transposases and no other nod genes
. Three nodD amplicons belong to this group. These five
paralogous sequences were removed from all downstream analysis to avoid
inflating the estimates of overall diversity. All 12 nodD alleles
seen in the genomes were recovered by MAUI-seq, plus an additional 5
alleles. MAUI-seq detected 12 of the 14 nodA alleles seen in
genomes, but found an additional 9 alleles (Table 1 andFigure 1 ). All of the abundant sequences with frequency
> 0.15 have an exact match in the 196 Rlt genomes,
and the allele frequencies are highly correlated between the two
datasets (Figure 2E -H ). The sequences identified only
by MAUI-seq are of low abundance, but appear to be genuine sequences
(Figure 2A -D and Figure 3 ). Likewise, the
sequences in the 196 genomes not found by MAUI-seq are only present in a
small number of isolates and at low frequencies; 8 out of the 13
sequences are only found in a single isolate (Figure 2 ).
Principal component analysis of the amplicons from individual genes
(Figure S2 ), revealed that different loci have different levels
of resolution. recA separated the French samples well from all
other locations, whereas the UK samples were clearly separated from the
other two field trial sites for all four loci. The high level of
diversity among and within DKO samples made it difficult to distinguish
them from the F and DK samples for most amplicons.
Each breeding trial site (DK, F, and UK) showed a distinct set of
amplicons, despite the nodules from each site being sampled from the
same F2 clover families from the same seedstock and being under
identical management (Table S1, Figure 3 ). The samples
from the trial sites were relatively uniform within each site, and each
sample had a low number of total observed amplicons, whereas the DKO
samples appeared less homogeneous within each sample.