Figure Legends

Figure 1. Primer design and method workflow. A: Primer design using the sense strand of the target DNA template as an example. The amplicon region of interest should be no longer than 500bp. The target-gene forward inner primer, universal forward outer primer and the target-gene reverse primer are all used in the initial PCR. The Nextera XT indices provide sample barcodes in a separate PCR step. The unique molecular identifier (UMI) region is shown in turquoise on the target-gene forward inner primer. B: Sample preparation workflow. C: MAUI-seq data analysis workflow.
Figure 2. Erroneous read formation and filtering. A:Schematic showing the formation of different sequences with identical UMIs, and bias introduced when sampling for sequencing. B:Example data showing the occurrence of real and chimeric rpoBsequences as primary and secondary sequence (log scale). S1 and S2: Real sequences derived from two different rhizobium strains (SM170C and SM3). Chi1-4: Chimeric sequences.
Figure 3. Amplicon diversity reported by MAUI-seq compared with the DADA2 and UNOISE3 analysis pipelines. Data are for four genes from nodule samples from two geographic locations, Store Heddinge (1-6) and Aarhus (7-8). Letters A-D denote the replicates within each plot (Supplementary Figure 5 ). Heatmap of the log10 transformed relative allele abundance of sequence clusters for individual genes. Lines connect identical sequences found by different clustering methods. Evidence that sequences are likely to be genuine is denoted by classifying them as reference (100% identity in at least 1 of 196 Rhizobium leguminosarum symbiovar trifolii genomes ),exact BLAST (100% query coverage and 100% identity against the whole-genome shotgun contigs BLAST database), single nt(one nt difference from either reference or exact BLAST match), andother . Sequences not reported by MAUI were classified assec/pri ratio (rejected as erroneous because of a high secondary to primary ratio), low UMI count (not reported because too rare), not found by MAUI (no accepted UMIs).
Figure 4. Genetic differentiation between populations visualised by Principal Component Analysis (A-C) andF ST (D-F ) of Rlt diversity in root nodule samples (8 sites, 4 replicates). Three analysis pipelines are compared: MAUI-seq (A,D ), DADA2 (B,E ), UNOISE3 (C,F ). The PCA analysis was based on log10 transformed relative allele abundance. F ST analysis was based on relative allele abundance. Data from all four genes (rpoB ,recA , nodA , and nodD ) were included in the analysis.