FST-outlier detection, transcriptome
analyses, and evaluations of site frequencies spectra
Genetic distances between groups of specimens were evaluated by means of
pairwise F ST (Weir & Cockerham, 1984). The
pairwise F ST was calculated in VCFtools v.0.1.16
(Danecek et al., 2011) from all SNPs. Negative values ofF ST were converted to zero and the meanF ST value for each contig/scaffold of the
reference genome was calculated as well as the mean and medianF ST across all SNPs. The distribution of
contig/scaffold F ST values were visualized using
R and the contigs/scaffolds with the top 1% F STvalues were defined as outliers.
Transcriptomes for M. × piperita (Figueroa-Pérez,
Reymoso-Camacho, Garcia-Ortega, & Guevara-Conzález, 2018) and M.
spicata (Jin et al., 2014) were downloaded and re-assembled. In short,
raw reads were cleaned and trimmed in TrimGalore v.0.6.7
(https://zenodo.org/badge/latestdoi/62039322) removing bases with Q
< 20 and reads shorter than 50 base pairs or containing any
ambiguous base (N). The cleaned reads were then used to assemble
transcripts with Trinity v.2.11.00 (Grabherr et al., 2011) with default
parameters. The sequences of F ST-outlier
contigs/scaffolds were extracted and used as databases for
blast-searches of all re-assembled mint transcripts with default
settings in BLAST v.2.9.0 (Altschul, Gish, Miller, Myers, & Lipman,
1990; Ye,, McGinnis, & Madden, 2006). Transcripts with top blast-hits
longer than 300 bp and with an e-value below 1e-5 were extracted and
annotations by extracting the top tblastx-hit (>100 amino
acids and e-value < 1e-5) to the UniProt database (The UniProt
Consortium, 2021).
Folded site frequency spectra (SFS) were finally calculated in angsd for
each morphologically defined group of specimens (see Results) and all
SNPs used in the genomic cluster and admixture analyses.