2.3 | Population and phylogenomic structure
We calculated the mean depth of coverage for each BAM file using Samtools. After filtering, we measured the mean depth, the frequency of missing data, and the inbreeding co-efficient, F, for each individual in the final VCF file of 30 caribou plus the reindeer using VCFtools. We performed principle component analyses in R 3.4.4 (R Development Core Team, 2006) using the packages vcfR (Knaus & Grünwald, 2017) and Adegenet (Jombart, 2008). The PCA was done on the VCF file containing all caribou and the reindeer (but not the Sitka deer). We then ran subsets of individuals on different PCA’s to gain higher resolution of different lineages (see Results).
We used VCFkit (available here: https://vcf-kit.readthedocs.io/en/latest/, using numpy 1.14 as the programme does not work with newer versions) to generate a fasta file using the ‘phylo fasta’ command. The programme concatenates SNPs for each sample, using the first genotype of each allele, and replacing missing values with an N. We ran this on the VCF file without the Sitka deer to create an unrooted tree as including the Sitka deer pushed all caribou too closely together to discern the branches. The resulting file was input into RAxML 8 (Stamatakis, 2014), and run using the GTRGAMMA model and 1,000 bootstrap replicates. We visualised the best tree in FigTree 1.4.2 (https://github.com/rambaut/figtree). We also aligned each of the conserved mammalian genes extracted from the genomes using BUSCO (above) to construct phylogenies, from which we made a consensus tree. We used the Sitka deer outgroup to root the tree. We used MUSCLE (Edgar, 2004) to align the sequences for each individual to create a combined fasta file for each gene. We then used RAxML as above to create a gene tree for each file, and then used ASTRAL-III (Zhang, Rabiee, Sayyari, & Mirarab, 2018) to create a consensus tree which was visualised in FigTree.
We used the populations module in Stacks 2.4.1 (Catchen, Hohenlohe, Bassham, Amores, & Cresko, 2013) to convert our VCF files (both with and without the Sitka deer) into an input file for Treemix 1.13 (Pickrell & Pritchard, 2012). We ran Treemix from 0-9 migration events, with three iterations of each, grouping the SNPs in windows to account for possible linkage using a block size of 1,000 for two of the iterations and 5,000 for one of the iterations (because the OptM package, below, must have different likelihood scores between iterations). We plotted the resulting trees, and the residual plots, in RStudio 1.0.136 (RStudio Team, 2015). We then used the R package OptM (Fitak, In Review.) to calculate the second order rate of change in the log-likelihood of the different migration events (the ad hoc statistic delta M) to help infer how many migration events to visualize.