Read processing and data analysis
The PEAR assembler was used to merge paired ends . Python scripts were
used to separate the merged reads by gene (MAUIsortgenes.py) and to
calculate allele frequencies both with and without the use of UMIs
(MAUIcount.py). The scripts are available in the GitHub repositoryhttps://github.com/jpwyoung/MAUI.
Sequences were clustered by UMI, and the number of unique UMIs was
counted for each distinct sequence, provided that sequence had at least
two more reads with that UMI than any other sequence. In cases where two
or more sequences were associated with the same UMI, the second most
abundant sequence was noted, and sequences that occurred more than 0.7
times as often as second sequences than as the main sequence associated
with a UMI were filtered out of the results as putative PCR-induced
chimeras or other errors. Sequences with primers removed (ignoring UMIs)
were also clustered using DADA2 (version 1.8) and UNOISE3 (USEARCH
version 11.0.667) with default settings. An overall read frequency
filter of 0.1% was applied to DADA2 and UNOISE3 outputs to match
MAUI-seq accepted sequences filtering. Scripts used for DADA2, UNOISE3,
and figure generation are available in Additional file 3 ,4 , and 5 , respectively. Output abundance data were
then processed for statistical analysis and figure generation using
various R packages (Additional File 3, 4, and 5 ; ).
Principal components were calculated with the R ‘prcomp’ package using
singular value decomposition to explain the Rhizobium diversity
and abundance within each sub-plot sample. Differences in allele
frequencies between samples were quantified using Bray-Curtis
beta-diversity estimation using the R package ‘vegdist.’ PERMANOVA tests
were performed using the R package ‘adonis’. Empirical Bayes estimator
of F ST was calculated using the R package
‘FinePop’ as previously described .