General Procrustes Analysis (GPA) was performed to standardise each individual’s shape for size and orientation using geomorph v3.0.7(R package) . Following this procedure, shape data were standardised for any residual size effects using the log of centroid size to correct for allometry. Size corrected data then were used in all subsequent analyses. Principal Component Analyses (PCA) were conducted on head shape and body shape separately using the plotTangentSpace andarrayspecs functions and plots were made in ggplot2v3.3.5 (R package) .
Analysis of variance (ANOVA) models were used to determine the relative contributions of parallel and non-parallel aspects of the morphological divergence across ecomorph pairs . Rather than using a single PC (Principal Component) score for the ANOVAs (ANOVA model: PC ~ ecomorph + lake + ecomorph x lake), we used a combined weighted PC variable with scores from multiple PCs to account for a higher proportion of biologically important variance in a single variable. For this variable, we combined all consecutive PCs that explained more than 5% of the total variance in shape. For both head shape and body shape, this was PC1 to PC5 (Figure S1). To weight the scores within the combined variable, each PC score was multiplied by the amount of variance explained by that PC (i.e., each individual PC1 score x proportion of variance explained by PC1) before all five weighted scores were summed for each individual. The EtaSq function in theBaylorEdPsych v0.5 (R package) was used to estimate the effect size of each model term. In our model, the ecomorph term represents the parallel or shared term, the ecomorph*lake interaction is the non-parallel (non-shared) term, and the lake term represents unique evolutionary history. ANOVA results for PC1 through PC5 separately can be found in Table S3.
Phenotypic Trajectory Analyses (PTA) were performed on the procrustes scores using the trajectory.analysis function in geomorphto look at the extent and direction of phenotypic change between ecomorphs. Magnitude of divergence is described by the length of trajectories (L) while the angle between trajectories (θ) describes their direction in phenotypic space. This approach allows us to determine how parallel the trajectories of each ecomorph pair are to one another by using the difference in phenotypic trajectory length (ΔLP) and the direction of phenotypic trajectories (θP). The significance of differences in trajectory lengths and differences in trajectory direction were assessed using 1,000 permutations . Phenotypic divergence between ecomorphs in different lakes was considered to be parallel if the direction and/or magnitude of phenotype change did not differ significantly between the pairs (P < 0.05).

Population genomics:

Filtered 75bp reads for each individual, generated via ddRADseq from Jacobs et al . (2020) and accessed from (ddRADseq NCBI short read bioproject: PRJNA607173), were mapped using bwa mem andSAMtools using settings described in that paper to theSalvelinus sp. genome from NCBI (ASM291031v2). The number of reads per individual ranged from 1 to 3.5 million. RAD loci were built in the gstacks module of Stacks v2.53 for 200 individuals (Awe_Bn= 26, Awe_Pl=29, Dughaill_Bn=28, Dughaill_Pl=27, naSealga_Bn=18, naSealga_Pl=20, Tay_Bn=21, Tay_Pl=31). SNPs were retained in the populations module of Stacks if they met the following criteria: present in 66% of all individuals in each population and across all populations, a minimum minor allele frequency of 0.05, maximum observed heterozygosity of 0.5. Each ecomorph within a lake was considered to be a discrete population. The script filter_hwe_by_pop.pl to filter out sites outside Hardy-Weinberg equilibrium within populations (available at https://github.com/jpuritz/dDocent). vcftools v0.1.13 was used to filter to a minimum coverage of 5x and a maximum of 50x. A principal component analysis was performed to identify the major axes of genetic variation using SNPRelate v1.22.0 (R package) .

Genotype-phenotype association analyses:

To determine the association between genotypes and phenotypic variation in head or body shape, we ran Linear Mixed Models (LMM) in Gemma v0.98.1 . Univariate and Multivariate LMMs with Wald’s test were run using PCs 1-5 for head shape and body shape, the SNP dataset generated for the population genomics analyses, and lake of origin as a co-variate . Missing genotypes were imputed using LinkImpute v1.1.4 and a relatedness matrix was generated using Gemma before running the models. We determined significant associations using bonferroni-corrected P-values (0.05/7329 unlinked SNPs) from the Wald’s tests. The number of unlinked SNPs was determined by LD-pruning the full SNP dataset using the snpgdsLDpruning function in SNPRelate . Bayesian Sparse Linear Mixed Models (BSLMM) were run using PC1-5 variables to determine how much of the phenotypic variation is explained by the SNPs in our dataset (PVE) and secondly how much of that variation is explained by large-effect loci (PGE), and finally how polygenic each phenotype is (⍴). Manhattan plots were made using CMplot v4.0 (R package) (https://github.com/YinLiLin/CMplot).
We subsequently determined if SNPs showing significant associations with head shape or body shape morphology were found within annotated genes in the Salvelinus sp. reference genome using BEDtools v2.27.1 . The functions of genes containing, or ± 1kbp of, associated SNPs were investigated using GO term overrepresentation analysis (ORA) and gene set enrichment analysis (GSEA). These analyses were run usingtopGO v2.40.0 (R package) with all genes containing any RAD loci as the full comparison dataset. Results were summarised usingREVIGO before visualisation in Cytoscape v3.91 .

Comparisons to known QTLs:

Using existing information on the genetics of important phenotypes from other salmonid species, we mapped a database of 1,338 QTL markers to theSalvelinus sp. genome. This was based on a previously published database of QTLs involved in traits related to morphology and life history, derived from a range of salmonid species and previously mapped to the Salmo salar genome . Additionally, a literature search was conducted up to April 2021 to augment the existing database with more recently published QTLs. This literature search was conducted in Web of Science and Google Scholar using the search terms “QTL”, “quantitative trait loci”, “salmonid”, and the common and scientific names for rainbow trout, Atlantic salmon, Arctic charr, lake whitefish, Chinook salmon, coho salmon, brook trout, and lake trout. QTL marker sequences were gathered for 17 different phenotypes: body length, body shape, body weight, Fulton’s condition factor, directional change, disease resistance, embryonic development, gill rakers, growth rate, hatching time, head shape, parasite resistance, salinity tolerance, sexual maturation, smolting, spawning time, upper temperature tolerance (Table S1).
Following Jacobs et al. (2017), the strategy of mapping the QTL-linked markers to the Salvelinus sp. genome depended on the QTL marker type: RAD loci were mapped using Bowtie2 v2.4.4 and the very sensitive pre-set; microsatellite primer sequences, which are shorter, were mapped using Bowtie v1.3.1 allowing for 3 mismatches. QTLs for which the flanking markers mapped to different chromosomes were removed. Redundant QTLs, i.e., where two QTLs for the same trait from the same species mapped to the same location, were removed only keeping the QTL with the higher PVE or LOD score (following . For QTLs where more than one marker was reported, we attempted to map all markers. Position values for the QTLs markers were then compared to positions of the phenotype associated SNPs using BEDtools , with a cut-off of ±100kbp. This value was used so that we could consider SNPs within the range to be proximal to a QTL peak while also accounting for the large size of many of the QTLs in the database. In total, we successfully mapped 669 QTL-linked sets of markers to theSalvelinus sp. genome after removing redundant QTLs (Table S2).

Genomic response to selection:

We investigated if the phenotype-associated SNPs identified in our analyses showed signals of a response to selection and if those signals were replicated across ecomorph pairs. To test this, for each ecomorph pair we compared FST and DXY values for phenotype-associated SNPs to a random background subset of SNPs. This random subset was 100 SNPs randomly selected from the whole dataset and the mean FST and DXY values for those SNPs were calculated. This was repeated 10,000 times and the means for FST and DXY were taken across all permutations. These permuted values were then compared to the empirical mean FST and DXY values for the phenotype-associated SNPs using the t.test function in R.

Analyses of recombination rate variation:

To test the effect of the recombination landscape on phenotype-genotype association, we first estimated recombination rates using the published Arctic charr linkage map (N=3,636) using MareyMap v1.3.6 . RAD loci from the linkage map were aligned to the Salvelinus sp. reference genome with Bowtie2 using the -very-sensitive setting. Loci were kept if they uniquely mapped to one location, mapped to the same chromosome as all other loci on their linkage group, and followed the orientation of the linkage map (i.e., not reversed). The filtered dataset was used to estimate the recombination rate across each chromosome using a spline algorithm. Spar values were varied for each chromosome from 0.5 to 0.9, depending on chromosome size, to best fit the data . Subsequently, WindowScanR v0.1 (available at: https://github.com/tavareshugo/WindowScanR) was used to summarize recombination rate values in 1 MB windows along the genome. All SNPs were assigned to these windows using BEDtools . A random subset of 100 SNPs was then selected and the mean recombination rate for those SNPs was calculated based on their windows. This was repeated 10,000 times to generate a background mean recombination rate, which was then compared to the mean recombination rate of the phenotype-associated SNPs (based on their windows) using a t-test.