DNA extraction and metabarcoding
DNA was extracted using a CTAB-phenol-chloroform protocol as described in Godhe et al. (2001), with an added RNA digestion step during cell lysis (65ºC for 60 min using one mg RNaseA [mL CTAB buffer]-1). Genomic DNA yield was quantified using Qubit (Thermo Fisher Scientific). The evolution experiment used a single barcoding locus to track strain selection. This 523 bp locus,Sm_C12W1 , is located on contig 12 inside a pentatricopeptide (PPR) repeat region of gene Sm_t00009768-RA, encoding an RNA-binding protein. The locus has 38 SNP positions amongst the 58 strains used in this study, and 110 unique alleles with 100% heterozygosity, including two triploid/aneuploid strains (Pinder et al., 2023). Sm_C12W1was amplified from 100 ng DNA from the evolution experiments using the Phusion® High-Fidelity PCR Kit (Thermo Scientific). The primer concentration used was 0.1 µM each of Sm_C12W1-F (5’AGGYTTCGCCTCCTCAAAC3’) and Sm_C12W1-R (5’GGCACGATGCACACGCAAAG3’). A One-step PCR reaction with Illumina-adapter extended primers was run for 30 cycles, with five s denaturation (98ºC), five s annealing (62.5ºC), 30 s extension (72ºC), and a final five min extension. The PCR products were quantified using PicoGreen™ dsDNA kit (Invitrogen™) and normalized to 100 ng per sample. Magnet bead-based size selection (>300 bp), Nextera Dual-indexing, pooling, and further library preparation, as well as amplicon sequencing (1/2 of an Illumina MiSeq lane with 300 bp pair-end read with v3 chemistry), was carried out by the National Genomics Infrastructure (Uppsala, Sweden).
Amplicon sequence data were quality controlled and processed as described in (Pinder et al., 2023). Briefly, adapter sequences were removed from reads using Cutadapt version 3.2 (Martin, 2011), trimmed using a PHRED score >28 and a minimum read length of 180 bp, and merged into amplicons using BBmerge version 38.86 (Bushnell et al., 2017), with default settings. Although DADA2’s denoising algorithms (Callahan et al., 2016) could correct a substantial part of amplicons with sequencing errors, it also removed the majority of low frequency allelic observations, potentially generating false negatives (Pinder et al., 2023). We therefore opted to not use DADA2. Instead, we chose to rely on exact sequence matches to known alleles, and utilized strains heterozygosity to quality control and filter data. First, alleles shared between two to three strains were divided according to the abundance of each strain’s second unique allele using a set of differential equations (Table S1). In general, the proportion of shared allelei 1 belonging to a specific strain x, y, and z was computed individually for each sample based on the strain’s second unique allele (x2, y2, andz2 ):
Eq. 2:\(\text{Reads\ of\ }i1\ \text{belonging\ to\ strain\ }x\ =\ i1\ \times\ \frac{\text{strain\ }x\text{\ allele\ }x2\ \ }{(strain\ x\text{\ allele\ }x2\ +\ strain\ y\text{\ allele\ }y2\ +\ strain\ z\text{\ allele\ }z2)}\)
The two heterozygous allele counts were then summed up to represent the abundance of each strain. During this step any strain without observations of both alleles was disregarded as a false positive and counted as absent (through the first part of Eq. 3 below). The third allele of triploid strains was removed. Amplicon counts per strain were normalized to relative abundance (RA) as:
Eq. 3:\(\text{RA\ strain\ }x=\ \frac{\text{\ counts\ allele\ }x1\ \ \times\ counts\ \text{allele}\text{\ x}2}{\text{counts\ allele\ }x1\ \ \times\ counts\ \text{allele}\text{\ x}2}\ \times\frac{\text{\ counts\ allele\ }x1\ \ +\ counts\ \text{allele}\text{\ x}2}{\sum\text{counts\ allels\ }_{a-z}}\)
The changes in the relative abundance of each strain’s alleles, combined with the population’s overall growth rate over the first nine days (µ9days) of the evolution experiment, were used to compute each strain’s specific growth rate during co-cultivation in the evolution experiment.
Eq. 4:\(u\text{\ strain\ }x=u_{9days}+\frac{\text{LN}\left(\text{RA\ strain\ }x\ day\ 9\right)-\ LN\left(\text{RA\ strain\ }x\ day\ 0\right)}{9}\)
Strains not seen in any of the five experimental replicates at day nine were presumed to be extinct since dilution bottelnecks in population size were generally smaller than the sequensing depth. The false discovery rate (FDR) of this approch was quantified by taking advantage of the fact that the two populations were processed independently. FDR was computed by comparing the total number of strain observations (TO) to the False Positive (FP) observations of mining strain seen in a reference sample, or vice versa, accounting for the fact that only half of all FP can be detected this way:
Eq. 5: \(\text{FDR}=\frac{(FP\ \times\ 2)\ }{\text{TO}}\)
All data, scripts and metadata will be made publicaly available opun acceptance (Andersson, Berglund, et al., 2023), or can be accessed at: https://github.com/Bearstar85/Cu_evolution.