Population Structure Analyses
We employed several methods to investigate head louse genetic population
structure, we first conducted a Principal Component Analysis (PCA) using
the R package SNPRelate (Zheng et al., 2012). Secondly, a
Discriminant Analysis of Principal Components (DAPC) was performed using
the R package Adegenet (Jombart, 2008). Unlike PCA, which
summarizes the overall variability among individuals, DAPC maximizes the
between group variation while minimizing within group variation (Jombart
et al., 2010). In the first DAPC analysis, we used an a priori
population assignment by continents and used these as the population
identifiers. In the second analysis, we used the find.cluster() method
without prior population assignment. Both DAPC analyses were run with
500 Principal Components (PC) and retained the first 5 discriminant
functions. Optimum.a.score() was used to assess the optimum number of
PCs to retain after 100 simulations (Figure S2). We tested K values from
1 to 50 and selected the optimum number of K based on the lowest
Bayesian Information Criterion (BIC) for each value of K (Figure S1).
Results for both PCA and DAPC were visualized in a scatterplot in R.
We used a model-based Bayesian clustering analysis, fastSTRUCTURE v 1.0
(Raj et al., 2014) to infer ancestral population structure. We ran 10
iterations each for K=1 through K=10 with a 10-fold cross-validation and
selected the optimum K value through the “choose K” method. All the
input files were generated using PLINK v 2.0 (Chang et al., 2015) and
the results were visualized using CLUMPAK (Kopelman et al., 2015) and
DISTRUCT (Rosenberg, 2004). We also used the program ADMIXTURE to
implement a maximum-likelihood approach to infer population structure.
ADMIXTURE was run using the cross-validation flag (–cv) to select the
best K with the lowest cv error. Additional ADMIXTURE analyses were run
on each major genetic cluster that resulted from the main population
structure analyses through fastSTRUCTURE that included all samples. For
the population sub-structure analysis, individuals were selected if
their genetic ancestry was greater than 0.8 in the resulting q matrix,
for a given cluster.