Population Structure Analyses
We employed several methods to investigate head louse genetic population structure, we first conducted a Principal Component Analysis (PCA) using the R package SNPRelate (Zheng et al., 2012). Secondly, a Discriminant Analysis of Principal Components (DAPC) was performed using the R package Adegenet (Jombart, 2008). Unlike PCA, which summarizes the overall variability among individuals, DAPC maximizes the between group variation while minimizing within group variation (Jombart et al., 2010). In the first DAPC analysis, we used an a priori population assignment by continents and used these as the population identifiers. In the second analysis, we used the find.cluster() method without prior population assignment. Both DAPC analyses were run with 500 Principal Components (PC) and retained the first 5 discriminant functions. Optimum.a.score() was used to assess the optimum number of PCs to retain after 100 simulations (Figure S2). We tested K values from 1 to 50 and selected the optimum number of K based on the lowest Bayesian Information Criterion (BIC) for each value of K (Figure S1). Results for both PCA and DAPC were visualized in a scatterplot in R.
We used a model-based Bayesian clustering analysis, fastSTRUCTURE v 1.0 (Raj et al., 2014) to infer ancestral population structure. We ran 10 iterations each for K=1 through K=10 with a 10-fold cross-validation and selected the optimum K value through the “choose K” method. All the input files were generated using PLINK v 2.0 (Chang et al., 2015) and the results were visualized using CLUMPAK (Kopelman et al., 2015) and DISTRUCT (Rosenberg, 2004). We also used the program ADMIXTURE to implement a maximum-likelihood approach to infer population structure. ADMIXTURE was run using the cross-validation flag (–cv) to select the best K with the lowest cv error. Additional ADMIXTURE analyses were run on each major genetic cluster that resulted from the main population structure analyses through fastSTRUCTURE that included all samples. For the population sub-structure analysis, individuals were selected if their genetic ancestry was greater than 0.8 in the resulting q matrix, for a given cluster.