Figure Legends
Figure 1. Primer design and method workflow. A: Primer
design using the sense strand of the target DNA template as an example.
The amplicon region of interest should be no longer than 500bp. The
target-gene forward inner primer, universal forward outer primer and the
target-gene reverse primer are all used in the initial PCR. The Nextera
XT indices provide sample barcodes in a separate PCR step. The unique
molecular identifier (UMI) region is shown in turquoise on the
target-gene forward inner primer. B: Sample preparation
workflow. C: MAUI-seq data analysis workflow.
Figure 2. Erroneous read formation and filtering. A:Schematic showing the formation of different sequences with identical
UMIs, and bias introduced when sampling for sequencing. B:Example data showing the occurrence of real and chimeric rpoBsequences as primary and secondary sequence (log scale). S1 and S2: Real
sequences derived from two different rhizobium strains (SM170C and SM3).
Chi1-4: Chimeric sequences.
Figure 3. Amplicon diversity reported by MAUI-seq compared with
the DADA2 and UNOISE3 analysis pipelines. Data are for four genes from
nodule samples from two geographic locations, Store Heddinge (1-6) and
Aarhus (7-8). Letters A-D denote the replicates within each plot
(Supplementary Figure 5 ). Heatmap of the log10 transformed
relative allele abundance of sequence clusters for individual genes.
Lines connect identical sequences found by different clustering methods.
Evidence that sequences are likely to be genuine is denoted by
classifying them as reference (100% identity in at least 1 of
196 Rhizobium leguminosarum symbiovar trifolii genomes ),exact BLAST (100% query coverage and 100% identity against
the whole-genome shotgun contigs BLAST database), single nt(one nt difference from either reference or exact BLAST match), andother . Sequences not reported by MAUI were classified assec/pri ratio (rejected as erroneous because of a high
secondary to primary ratio), low UMI count (not reported
because too rare), not found by MAUI (no accepted UMIs).
Figure 4. Genetic differentiation between populations
visualised by Principal Component Analysis (A-C) andF ST (D-F ) of Rlt diversity in
root nodule samples (8 sites, 4 replicates). Three analysis pipelines
are compared: MAUI-seq (A,D ), DADA2 (B,E ), UNOISE3
(C,F ). The PCA analysis was based on log10 transformed relative
allele abundance. F ST analysis was based on
relative allele abundance. Data from all four genes (rpoB ,recA , nodA , and nodD ) were included in the
analysis.