2. Genotyping
The methods to sequence and genotype MHC IIβ alleles were identical to Stutz & Bolnick (2017). Briefly, genomic DNA was extracted from fin clips using a Promega Wizard 96-well extraction kit. We used PCR to amplify the second exon of MHC IIβ genes in each fish, with primers and PCR cycles as described in Stutz & Bolnick (2017). This exon contains the hypervariable peptide-binding region that binds to possible parasite antigens (Sommer, 2005). Each specimen was barcoded with a unique combination of forward and reverse primer tags for multiplexing. We used Quant-iT PicoGreen kits (Invitrogen P11496) to quantify DNA concentrations of magnetic bead-purified (Agencourt AMPure XP beads) PCR products, then pooled up to 400 samples in equimolar amounts to construct a library. We used Illumina Mi-Seq to sequence these multiplexed amplicon libraries. Then, we used a Stepwise Threshold Clustering (STC) program (Stutz & Bolnick, 2014), implemented in the AmpliSaS web software (URL: http://evobiolab.biol.amu.edu.pl/amplisat/index.php?amplisas; Sebastian, Herdegen, Migalska, & Radwan, 2016) to distinguish real sequence variants from sequencing error or PCR chimeras. The algorithm was originally validated by sequencing cloned amplicon products (Stutz & Bolnick, 2017). The software outputs a table of individual fish (rows) and unique MHC sequences (columns) with read depths.
To efficiently process data in AmpliSaS, we set the upper limit of read depth for each individual as 5000, which was sufficient to retrieve all the possible MHC alleles (Stutz & Bolnick, 2014). Because low sequencing coverage could bias the number of MHC alleles to be identified, individual fishes with coverage lower than 450 were excluded from this study (Fig. S1). After excluding the 160 individuals with low coverage (leaving N = 1277 individuals in the subsequent analyses), there was no longer a significant linear relationship between sequencing coverage and the number of MHC alleles (t = 1.46, p = 0.14). The number of unique MHC alleles was inferred based on the translated protein sequences, thus merging distinct exonic sequences that produce identical amino acid sequences.