Shawn Narum - Authorea

Shawn Narum

Public Documents 4

POOLPARTY2: An integrated pipeline for analyzing pooled or indexed low coverage whole...

Stuart Willis

and 3 more

June 06, 2023

Whole genome sequencing data allow survey of variation from across the genome, reducing the constraint of balancing genome sub-sampling with recombination rates and linkage between sampled markers and target loci. As sequencing costs decrease, low coverage whole genome sequencing of pooled or indexed-individual samples is commonly utilized to identify loci associated with phenotypes or environmental axes in non-model organisms. There are, however, relatively few publicly available bioinformatic pipelines designed explicitly to analyze these types of data, and fewer still that process the raw sequencing data, provide useful metrics of quality control, and then execute analyses. Here, we present an updated version of a bioinformatics pipeline called POOLPARTY2 that can effectively handle either pooled or indexed DNA samples and includes new features to improve computational efficiency. Using simulated data, we demonstrate the ability of our pipeline to recover segregating variants, estimate their allele frequencies accurately, and identify genomic regions harboring loci under selection. Based on the simulated data set, we benchmark the efficacy of our pipeline with another bioinformatic suite, ANGSD, and illustrate the compatibility and complementarity of these suites by using ANGSD to generate genotype likelihoods as input for identifying linkage outlier regions using alignment files and variants provided by POOLPARTY2. Finally, we apply our updated pipeline to an empirical dataset of low coverage whole genomic data from uncurated population samples of Columbia River steelhead trout (Oncorhynchus mykiss), results from which demonstrate the genomic impacts of decades of artificial selection in a prominent hatchery stock.

Single nucleotide polymorphism genotypes and ploidy estimates for ploidy variable spe...

Stuart Willis

and 5 more

August 28, 2020

Polyploidization has played a critical role in the evolution of several major organism groups, including vertebrates, but much of our knowledge of the evolution of polyploids comes from allopolyploid and often rediploidized lineages, which partly reflects the difficulty of obtaining genotype data from polysomic genomes. We combined several contemporary methods to develop markers for single nucleotide polymorphisms compatible with simultaneous ploidy estimation and high throughput genotyping, and analyzed these data with recent software developments that accept polysomic data. We demonstrate the utility of this combination to develop genetic resources for polysomic species by applying it to the ploidy-variable and polysomic white sturgeon (Acipenser transmontanus), an imperiled species under conservation management in the Pacific Northwest. We introduce a primer and probe set for 325 SNP markers for use with the ‘Genotyping-by-thousands’ (GT-seq) method, and provide updated scripts that incorporate a function to estimate ploidy from each individual using read count data. We examine the reliability of tetrasomic inheritance in a large sample of paleo-octoploid individuals and the expected Mendelian inheritance patterns in known cross families. We then demonstrate our ability to use these data to infer parentage, relatedness, and other population genetic parameters. Our combined process thus improves the accessibility of genetic information to facilitate future investigations of white sturgeon and is expected to be widely applicable to other polyploid species.

Genomic islands of divergence infer a phenotypic landscape in Pacific lamprey

Jon Hess

and 12 more

April 02, 2020

Pacific lamprey (Entosphenus tridentatus) is a culturally important and imperiled anadromous fish with a parasitic ocean phase. Biological uncertainties challenge restoration efforts and life-history research is needed to explain observed trait variation and inform management actions. Using two new whole genome assemblies and genotypes from 7,716 single nucleotide polymorphism (SNP) loci in 518 individuals from across the species range, we identified four large regions of high genomic divergence (on chromosomes 01, 02, 04, and 22). We genotyped a subset of 302 broadly distributed SNPs in 2,145 individuals for genotype-by-phenotype trait associations for adult body size, sexual maturity, migration distance and timing, adult swimming ability, and larval growth. Body size traits were strongly associated with SNPs on chromosomes 02 and 04. Moderate associations also implicated SNPs on chromosome 01 as being associated with variation in female maturity. Using genotypic frequencies of candidate SNPs for female maturity and body size, we extrapolated a heterogeneous spatiotemporal distribution of these traits based on independent datasets of larval and adult collections. These maturity and body size results guide future studies to validate these predicted phenotypic distributions across the geographic range and elucidate factors driving regional optimization of these traits for fitness.

Distribution of genetic variation underlying adult migration timing in steelhead of t...

Erin Collins

and 3 more

May 04, 2020

Fish migrations are energetically costly, especially when moving between fresh and saltwater, but are a viable strategy for Pacific salmon and trout (Oncorhynchus spp.) due to the advantageous resources available at various life stages. Anadromous steelhead (O. mykiss) migrate vast distances and exhibit variation for migration phenotypes that have a genetic basis at candidate genes known as greb1L and rock1. We examined the distribution of genetic variation at 13 candidate markers spanning greb1L, intergenic, and rock1 regions versus 246 neutral markers for 113 populations (n = 9,471) of steelhead from inland and coastal lineages in the Columbia River. Patterns of population structure with neutral markers reflected genetic similarity by geographic region as demonstrated in previous studies, but candidate markers clustered populations by predominate genetic variation associated with migration timing. Mature alleles for late migration had the highest frequency overall in steelhead populations throughout the Columbia River, with only 9 of 113 populations that had a higher frequency of premature alleles for early migration. While a single haplotype block was evident for the coastal lineage, we identified multiple haplotype blocks for the inland lineage. The inland lineage had one haplotype block that corresponded to candidate markers within the greb1L gene and immediately upstream in the intergenic region, and the second block only contained candidate markers from the intergenic region. Haplotype frequencies had similar patterns of geographic distribution as single markers, but there were distinct differences in frequency between the two haplotype blocks for the inland lineage. Redundancy analyses were used to model environmental effects on allelic frequencies of candidate markers and significant variables were migration distance, temperature, isothermality, and annual precipitation. This study improves our understanding of the spatial distribution of genetic variation underlying migration timing in steelhead as well as associated environmental factors and has direct conservation and management implications.