Allelic frequency change
To provide insights into the genetic architecture of mitonuclear
incompatibilities we examined allelic frequency change across the whole
genome in lines evolving under each mitochondrial background, relative
to the initial F2 females. We used the pool-seq approach, which has
recently been established for evolve-and-resequencing studies
(Schlotterer et al. 2015), including in T. californicus(Lima & Willett 2018; Lima et al. 2019; Healy:2020fn; Griffithset al. 2020).
For lines evolving under each mitochondrial background, we selected
three replicates that showed larger census size, no further decrease in
fecundity, and some recovery in survivorship. We pooled 200 adult
individuals, or the maximum available, and extracted genomic DNA using
the phenol chloroform protocol and digestion with RNAse (Sambrook &
Russell 2001). For the initial F2 females, we also extracted DNA from
pools of 100 adult outcross F2 females that gave rise to the
experimental populations, after removing their egg sacs. All samples
were sequenced in the Illumina HiSeq 2000 platform with 100 bp
paired-end libraries. Reads were trimmed for quality using PoPoolation2
(Kofler et al. 2011), discarding bases with Phred quality scores
lower than 25 and keeping reads of at least 50bp after trimming.
We only considered SNPs that were fixed between parental populations and
therefore that can be used to determine the ancestry of the nuclear
alleles favored in either mitochondrial background. We used the
bioinformatics pipeline established by (Lima & Willett 2018; Limaet al. 2019) (see references for details), and the syntenic
reference genomes of SD and SC, where more than 90% of the genome is
anchored to the 12 chromosomes (Barreto et al. 2018). In short,
first, we made the two reference genomes equivalent in length and
accuracy by adding N’s to any position where either parental reference
had an “N”. Second, we established a list of fixed SNPs by i)
performing reciprocal mapping of reads from one population (Barretoet al. 2018) against the reference of the other, ii) considering
only “fixed” positions where all mapped reads showed an alternative
nucleotide to the reference (coverage ≥ 15), and iii) comparing
reciprocal mapping to keep only SNPs that were “fixed” in both
mappings. Third, we mapped our reads for each hybrid dataset to both
parental references using BWA-MEM with default parameters (Li & Durbin
2009) and keeping reads that mapped with MAPQ score > 20.
When mapping reads from hybrids to one of the parental references, there
is a known bias towards an over-representation of reads of the reference
population (Lima & Willett 2018). By averaging the read counts for each
SNP, from the alignment to each parental population, this
reference-population bias is overcome. Allele counts for the fixed SD
and SC alleles and respective frequencies were determined using
PoPoolation2 (Kofler et al. 2011) (minimum coverage of minor
allele ≥ 4). Finally, data for all cross datasets were compared and only
SNP that passed all filters up to this point, for both reciprocal
crosses, were kept for statistical analysis of allele frequency change.
These SNPs are expected to show high levels of variance, due to the
variation on sequencing coverage and sampling alleles from the pool of
individuals (Lima & Willett 2018). We averaged allele counts (and
frequencies) in non-overlapping windows of 500 consecutive SNPs. We
refer to these as “genomic windows”.
In each mitochondrial background separately, we determined which of
these genomic windows show a significant allelic frequency change by
comparing the initial F2 hybrids to the evolved hybrids, using the
Cochran-Mantel-Haenszel (CMH) test as implemented in the mantelhaen.test
function in R package (R Development Core Team). This test operates on
2x2 contingency tables (times the number of replicates), comparing the
counts for SD and SC alleles at the beginning and at end of the
experiment, and thus it is highly sensitive to variation in sequencing
coverage. To account for differences in coverage between the initial and
evolved hybrids, we estimated initial counts of SD and SC alleles by
multiplying the allelic frequency estimated in the initial F2 hybrid
pool by the total coverage observed at each evolved hybrid separately
(i.e. counts for SD + SC). This approach results in a P-value per
genomic window, which reflects the probability of rejecting the null
hypothesis of similar allelic frequency during experimental evolution in
a given mitochondrial background.
We used a z-score of 2 to identify genomic windows showing higher
P-values relative to the average values observed genome wide, consistent
with stronger selection driving allelic frequency change. We have
estimated three z-score threshold based on three distributions of
P-values: i) from both crosses taken together (corresponding to a -log10
(P-value) of 3.24), ii) only from crosses evolving under the SC
mitochondria (corresponding to a -log10 (P-value) of 4.17), and iii)
only from crosses evolving under the SD mitochondria (corresponding to a
-log10 (P-value) of 1.73). Because the two higher thresholds were highly
restrictive to identify windows under selection in lines with the SD
mitochondrial background (only zero or five windows), we considered the
lower threshold to compare windows across mitochondrial backgrounds.
Since adjacent genomic windows are physically linked, we looked for
consecutive genome windows that are under selection. We refer to these
as genomic regions. Genomic regions were classified as under selection
through the following criteria: i) at least 10 genomic windows with
-log10 (P-value) 1; ii) with at least one window above a z-score of 2;
and iii) ending with one window -log10 (P-value) 1. To represent which
parental allele dominates during the experimental evolution, we
classified genomic regions as blue for overrepresentation of the SC
nuclear allele and red for overrepresentation of SD allele. By comparing
allelic frequency change in reciprocal crosses, we identified genomic
regions fitting three categories: “uniform selection”, where the same
allele dominates irrespective of the mitochondrial background;
“divergent selection” where the dominant nuclear allele matches the
mitochondria; and “antagonistic selection” where the dominant allele
does not match the mitochondria. While we chose to use z-scores as
threshold for selection, the results were qualitatively similar when
considering a threshold based of the top 5% of the distribution of
P-values.