Read processing and data analysis
Paired-end reads were merged using the PEAR assembler . Reads were separated by gene, filtered using the MAUI-seq method using a secondary/primary read ratio of 0.7 and a filter of 0.1% UMI abundance was added as previously described .
Neighbour-joining phylogenetic trees were constructed using MEGAX software with 500 bootstrap repetitions. Rlt reference sequences for all four genes were extracted from the 196 strains with available whole genome sequencing data . Relative allele abundance was calculated for both the MAUI-seq data and the 196 Rlt genomes. Raw UMI counts are shown in Figure S3 . Geographical maps were generated using the R packages ‘maps’ and ‘ggplot2’ . Heatmaps were generated from relative allele abundance of individual genes using ‘ggplot2’. The hierarchical F -statistics (F ST) were calculated using the ‘varcomp.glob’ function and tested using ‘test.between’ and ‘test.within‘ in the ‘Hierfstat’ R package . Correlations and associated p-values were calculated using the ‘agricolae’ R package . Pairwise geographic distances were calculated using the ‘geosphere’ R package . Mantel tests were performed using 5000 repetitions in the ‘ade4’ R package . Correlations between soil chemical properties and allele frequency were done using base R and visualised using ‘corrplot’ .
Nucleotide diversity (π, the average number of nucleotide differences per site between two DNA sequences in all possible pairs in the sample population) was calculated for each individual sample within the DKO and field trial data using a custom script.