Read processing and data analysis
Paired-end reads were merged using the PEAR assembler . Reads were
separated by gene, filtered using the MAUI-seq method using a
secondary/primary read ratio of 0.7 and a filter of 0.1% UMI abundance
was added as previously described .
Neighbour-joining phylogenetic trees were constructed using MEGAX
software with 500 bootstrap repetitions. Rlt reference sequences
for all four genes were extracted from the 196 strains with available
whole genome sequencing data . Relative allele abundance was calculated
for both the MAUI-seq data and the 196 Rlt genomes. Raw UMI
counts are shown in Figure S3 . Geographical maps were generated
using the R packages ‘maps’ and ‘ggplot2’ . Heatmaps were generated from
relative allele abundance of individual genes using ‘ggplot2’. The
hierarchical F -statistics (F ST) were
calculated using the ‘varcomp.glob’ function and tested using
‘test.between’ and ‘test.within‘ in the ‘Hierfstat’ R package .
Correlations and associated p-values were calculated using the
‘agricolae’ R package . Pairwise geographic distances were calculated
using the ‘geosphere’ R package . Mantel tests were performed using 5000
repetitions in the ‘ade4’ R package . Correlations between soil chemical
properties and allele frequency were done using base R and visualised
using ‘corrplot’ .
Nucleotide diversity (π, the average number of nucleotide differences
per site between two DNA sequences in all possible pairs in the sample
population) was calculated for each individual sample within the DKO and
field trial data using a custom script.