Field collections, breeding design and rearing environment
Fieldwork was carried out at Yellow Island Aquaculture Ltd. (YIAL), a Chinook salmon hatchery and organic-based farm located east of Campbell River on Quadra Island, Vancouver Island, British Columbia, Canada (Figure 1). Wild sourced eggs from Robertson Creek and milt from Big Qualicum were used to produce the seven-generation, fully-domesticated, stock of Yellow Island (YIAL), which has been in production since 1985. Eggs from 17 highly inbred females (offspring from a self-crossed hermaphrodite) were mixed, and subsets of the mixed eggs were individually fertilized using 10 sires from each of the domestic wild stocks of Chinook salmon. This produced 80 full- and half-sib families belonging to seven outcrossed hybrid stocks (YIAL x Wild) and a fully inbred domesticated stock (YIAL x YIAL). Consequently, the maternal line is identical for all hybrid crosses (herein "crosses"), but the paternal line for those crosses varies depending on the geographical origin of the paternal line.
Eggs from each family were incubated in replicate cells in divided vertical incubation trays, until swim-up stage once they had accumulated 1000 ATUs (19 weeks post-fertilization). To prevent potential density effects, a random subset of no more than 120 alevin per family were transferred to 200 L freshwater barrels for rearing. In the tanks, individuals were ad libitum four times daily (Taplow Feeds, BC, Canada)., and the temperatures and dissolved oxygen levels were maintained at 10-12 oC and above 90% saturation, respectively. Thirty-two weeks post fertilization, the individuals were PIT tagged for later cross and family identification and were vaccinated for vibriosis (Vibrogen 2: Vibrio anguillarum-ordalii; Novartis Animal Health Canada, Inc. Charlottetown, PEI). Smoltifying fish were separated by stock and divided randomly between replicate saltwater net pens (dimensions: 4.5 m X 4.5 m × 3.0 m deep) until 80 weeks post-fertilization. Fish were fed pre-weighed commercially available salmon food four times a day (Taplow Feeds, BC, Canada), and care was taken to prepare and maintain identical pen rearing conditions.
Sample collection, DNA extraction and next generation sequencing
A 10 to 25 fish subset of the 80 week-old fish (mean size = 182g) was randomly selected from each pen (Supplementary Table 1), and the fish were humanely euthanized and sacrificed to sample the gut contents. To obtain gut content samples for DNA extraction, the body cavity was cut open with a sterile scalpel and the hindgut of each offspring was collected. The gut samples were immediately stored in RNAlaterTM for transport to the research facility, where it was stored in the freezer at -20oC until DNA extraction.
We extracted DNA from gut content (and a negative control, containing only nuclease free water) of the distal intestine using commercially available E.Z.N.A Stool DNA Kit (OMEGA Bio-tek) following the manufacturer’s protocol. Amplicon libraries were prepared in two steps as previously described (He et al., 2017). Briefly, the universal primer set of 787F (V5F) (ATTAGATACCCNGGTAG) and 1046R (V6R) (CGACAGCCATGCANCACCT) was first used to PCR amplify the 16S rRNA encoding gene sequences containing the V5-V6 hypervariable regions. Those primers were modified by adding a short sample-code specific sequence (“barcode”) and an Ion Torrent adaptor sequence to the 5’ end of the forward (acctgcctgccg) and reverse (acgccaccgagc) primers. The PCR product was visualized for amplification success on a 2% agarose gel, and PCR product purification was then carried out using Agencourt AMPure XP beads (Beckman Coulter Genomics GmbH, Mississauga, ON, Canada). A second short-cycle ligation PCR was conducted to ligate adaptor and the barcode sequences to the amplicon using purified PCR product from the first round PCR. The second round of PCR used: forward primer UniA (CCATCTCATCCCTGCGTGTCTCCGACTCAGXXXXXXXXXXGATacctgcctgccg), and reverse primer UniB (CCTCTCTATGGGCAGTCGGTGATacgccaccgagc), where the underlined sequence in UniA consisted of unique 10-12 bp barcode sequences necessary for the sample demultiplexing in sequence analysis and the lower-case sequence were the reverse compliment of the added sequence in the first primer set. Barcoded samples were combined with amounts selected based on PCR band intensity and a commercially available kit (GenCatchTM, Epoch Life Science, Inc., Sugar Land, TX., USA) was used to purify the PCR product from incomplete amplicons and primer dimers. The final library was sequenced with an Ion Torrent™ Personalized Genome Machine (Thermo Fisher Scientific, Inc., Mississauga, Canada).
Sequence processing and data analysis
Sequence quality checks were initially conducted using personal genome machine (PGM) software (Torrent Suite™ v5.6) using default parameters to: 1) remove mixed clonal libraries on Ion Sphere Particles (ISPs) known as polyclonals, 2) remove low-quality sequences, and 3) remove sequences with low quality data at the 3’ end of the read. Prior to sequence processing, the PGM-generated FASTQ file was split into multiple FASTQ files based on exact matches of barcode sequences (mismatches = 0) corresponding to the samples in study using FASTX Barcode Splitter (Gordon & Hannon, 2010).
Unless otherwise stated, all sequence processing and statistical analysis was performed using the Quantitative Insights into Microbial Ecology (QIIME-2) pipeline (version 2020.6; Bolyen et al., 2019). Primer and NGS-adapter sequences were removed using Cutadapt (Martin, 2011) and reads with no primer-sequence matches were discarded. In total, 8,820,813 raw reads contained the primer-sequence and retained for processing. Sequencing reads were denoised, dereplicated, and chimera-filtered using DADA2 (Callahan et al., 2016) with default parameters, including a maximum error rate of 2 and consensus chimera removal using minimum parent-sequence fold-count of 1. Based on visual inspection of per-bp position quality score plots, reads were trimmed to 265 bp length sequences using DADA2 to main high-quality scores across reads. The DADA2 sequence quality-check process generated 3,909,887 high quality reads, corresponding to 3,139 unique amplicon sequence variants (ASVs) across 331 samples. Although no band was observed for our negative control, sequencing resulted in 123 reads, representing 6 ASVs. We followed three steps to accurately assign taxonomy to ASVs across all samples. First, full-length 16S non-redundant small subunit (NR-SSU) sequences clustered at 99% sequence identity from the SILVA database (Quast et al., 2012; Yilmaz et al., 2014) were used to extract reference-reads matching our high-quality filtered ASVs, and then trimmed to 265 bp-length reads. Second, the extracted reference reads were used to train a Naïve Bayes classifier using default Scikit-learn parameters (Quast et al., 2012; Bokulich, Robeson & Dillon, 2020). Finally, the trained classifier was used to assign SILVA-based taxonomic annotations (Quast et al., 2012; Yilmaz et al., 2014; Bokulich, Robeson & Dillon, 2020) to all distinct ASVs using a confidence level of 80%. The assigned taxonomy labels were used to filter ASV-table sequences classified as Eukaryota, Archaea, Chloroplasts, Mitochondria, and those identified as 'Ambiguous taxa'. ASVs detected in the negative control were measured for prevalence using the R (version 4.0.2, R Core Team, 2013) package, decontam (Davis, Proctor, Holmes, Relman & Callahan, 2018), using a prevalence value of 20%, and 3 contaminant ASVs were subsequently removed from our sequencing library. Two ASVs identified with decontam belonged to genera known as common reagent contaminants – Ralstonia, and Burkholderia (de Goffau et al., 2018). To main sufficient sequencing depth for statistical analysis, samples with 3,000 reads or less were removed from the analysis. To visually determine sequencing depth sufficiency, rarefaction curves of observed ASVs were plotted with 10 iterations at each of 10 increments (steps) leading to the smallest sample sequencing depth (3,190). The final processed ASV-table contained 3,751,832 reads corresponding to 221 samples and 2,869 unique ASVs.
Diversity analyses
To achieve uniform sequencing depth across samples, a rarefied table was generated in QIIME2 (version 2020.6), using the minimum library sequencing depth (n = 3,190 reads), which was used to compute alpha and beta diversity metrics. To visualize differences among crosses, boxplots were created using the R (v4.0.2; R Core Team, 2016) package ggplots2 (v3.3.2; Wickham, 2016). To perform alpha diversity analyses, the Shannon and Chao1 indices were measured using the rarefied table, and nested-ANOVA models were constructed for each diversity metric in the R (v4.0.2) package lme4 (v1.1-23; Bates, Sarkar, Bates & Matrix, 2007). ANOVAs were constructed as follows: 'Cross' was used as a main factor, 'sire' was nested in cross, and 'replicate pen' was nested in 'sire' nested in 'cross'. Reduced models were constructed for each factor, and likelihood ratio tests (using maximum likelihood) were run to compute χ2