Read processing and data analysis
The PEAR assembler was used to merge paired ends . Python scripts were used to separate the merged reads by gene (MAUIsortgenes.py) and to calculate allele frequencies both with and without the use of UMIs (MAUIcount.py). The scripts are available in the GitHub repositoryhttps://github.com/jpwyoung/MAUI. Sequences were clustered by UMI, and the number of unique UMIs was counted for each distinct sequence, provided that sequence had at least two more reads with that UMI than any other sequence. In cases where two or more sequences were associated with the same UMI, the second most abundant sequence was noted, and sequences that occurred more than 0.7 times as often as second sequences than as the main sequence associated with a UMI were filtered out of the results as putative PCR-induced chimeras or other errors. Sequences with primers removed (ignoring UMIs) were also clustered using DADA2 (version 1.8) and UNOISE3 (USEARCH version 11.0.667) with default settings. An overall read frequency filter of 0.1% was applied to DADA2 and UNOISE3 outputs to match MAUI-seq accepted sequences filtering. Scripts used for DADA2, UNOISE3, and figure generation are available in Additional file 3 ,4 , and 5 , respectively. Output abundance data were then processed for statistical analysis and figure generation using various R packages (Additional File 3, 4, and 5 ; ). Principal components were calculated with the R ‘prcomp’ package using singular value decomposition to explain the Rhizobium diversity and abundance within each sub-plot sample. Differences in allele frequencies between samples were quantified using Bray-Curtis beta-diversity estimation using the R package ‘vegdist.’ PERMANOVA tests were performed using the R package ‘adonis’. Empirical Bayes estimator of F ST was calculated using the R package ‘FinePop’ as previously described .