Application 2: Barcoded individuals in population samples of
steelhead
After trimming, mapping, and quality filtering, PPalign
provided 287 to 405 million mapped reads per sample, which allowed
between 67.2 and 70.8% sampling of the genome at the minimum number of
reads per sample (10), as revealed by PPstats (Supplemental
Figure 7). The distribution of genomic extent across chromosomes was
similar to other lcWGS analyses of the O. mykiss genome (e.g.
Micheletti, Hess, Zendt, & Narum, 2018), indicating this pattern is a
function of the library preparation technique used for all these samples
or idiosyncrasy of this genome. Sequencing of indexed individuals
allowed us to estimate that the mean coverage per individual ranged from
0 to 2.3 (median 0.23), to confirm that it was similar across
populations (median 0.23, 0.26, 0.23 and standard deviation 0.39, 0.28,
and 0.28 for Willamette River, Lewis River, and Skamania Hatchery,
respectively), and to reduce bias in allele frequency estimates
introduced by sampling variance across samples (normalize). After
population-specific filters, PPanalyze examined 22,934,298
variants (22,832,805 [99.5%] in the chromosome scaffolds) with a
suite of analyses. Density plots revealed that variants were sampled
from across the genome, with a handful of areas of notable density. A
principal components analysis made with loci with a maximum difference
in allele frequencies below 0.9 (thus excluding the most divergent
outlier loci), while unremarkable, confirmed that the primary axis,
which explained ~86% of the variance in the data, didnot segregate the Skamania hatchery sample from the natural
origin samples, implying that outlier regions related to the main
contrast (hatchery vs. natural) would not be confounded by background
population structure. Raw PPanalyze output revealed many small
regions of strong genomic divergence, while 51 separate regions were
identified as significant at p ≤ 0.05 across ξ values and
replicates in Local Score analyses (Figure 4, Table 3, Supplemental
Table 1). The two most significant (highest local score) regions were
the region of chr. 28 containing the genes GREB1L andROCK1 and the region of chr. 25 containing the gene SIX6 ,
which have been previously found associated with migration timing and
age at maturity in steelhead and other salmonids, respectively (e.g.
Willis et al., 2020). There were also many additional regions whose
potential association with migration phenology, age at maturity, or
domestication (adaptation to hatchery production) could be explored
further. For example, a region of chromosome 20 that was consistently
recovered in the Local Score analyses contained two protein coding
genes: ATP-citrate lysase (synthase), or ACLY, and, dnaJ homolog
subfamily C member 7, or DNAJC7. ACLY is a ubiquitous cytosolic enzyme
positioned at the intersection of nutrients catabolism and cholesterol
and fatty acid biosynthesis, and DNAJC7 is a member of the heat shock
protein 40 family and acts as co-chaperone regulating the molecular
chaperones HSP70 and HSP90 in folding of steroid receptors, such as the
glucocorticoid receptor and the progesterone receptor. Notably,
identification of linkage outliers for these three chromosomes
identified the same regions, but in the case of chromosomes 25 and 28,
also identified other regions that Local Score did not, presumably
because, while they exhibit strong linkage across all samples, these
regions were not consistently divergent between the hatchery and natural
origin samples.