Figures
Figure 1. Geographic distribution and structure plots for each collection site (black squares) overlaid on the historical distribution of the species described in Wirth and Jones 1957. The fastSTRUCTURE results are for 206 individuals inferred by 3612 SNPs and assuming five populations (K=5). The vertical bars within each collection site represents an individual, with each color representing a cluster. The putative species identity of each clusters are as follows: Culicoides occidentalis (blue),C. sonorensis (teal), C. albertensis (yellow), C. variipennis (red), and an unidentified population in San Diego, CA (CASD) (green). The black bars above structure plot indicates an individual for which the COI gene was also sequenced. The individuals inferred to be hybrids are labeled h1-7.
Figure 2. (a) A 3D representation of the principal Component Analysis (PCA) of all individuals included in the study. Each color represents the cluster inferred from the structure analysis; C. albertensis (yellow), C. occidentalis (blue), C. sonorensis (teal), C. variipennis (red), and the unidentified San Diego population (green). Hybrids (h1–h7) are designated with a black circle and their inferred parental ancestry is depicted with pie graphs. The geographic locations of the two C. occidentalisclusters are labeled next to each grouping (see table 1 for abbreviation). (b)Unrooted maximum likelihood phylogenetic tree based on 199 individuals inferred from 3612 SNPs (the hybrids were removed here but are included in Fig. S3.). Clade colors represent the clusters inferred from the structure analysis; C. albertensis (yellow), C. occidentalis (blue), C. sonorensis (teal), C. variipennis(red), and the unidentified San Diego population (green). Support values written on the branches: rapid bootstrap (%) / SH-aLRT support (%) / ultrafast bootstrap support (%). For clarity, the values within each cluster are not shown.
Figure 3. For each species, an independent SNP dataset was used to calculate the best K using fastSTRUCTURE v.1.04 with the inferred clusters denoted by varying shades. The IBD (shown as pairwiseFST by log geographic distance) for each species were calculated in Genepop v.4.7.0. The individuals from San Diego, CA are not included here as they were only found in a single population.
Figure 4. Loci under selection. Individual loci from the “all-species” dataset (566 SNPs) and the species-specific datasets are plotted against their corresponding log10 values. A log10 over 1.0 is considered to have high support (95% CI) for being under selection with a log10 value over 2.0 corresponding 99% CI for being under selection. The individuals from San Diego, CA do not have a species-specific dataset as they were only found in a single population, however, they were still included in the “all species” analysis.
Figure 5. A haplotype network inferred by a median-joining method, using 285 mitochondrial (mt) DNA sequences of the C. variipennis complex from 27 states in the U.S. as well as British Columbia and Ontario, Canada. The size of each circle represents the frequencies of the haplotype. The 67 sequences obtained in the present study, see figure 1, are colored according the clusters assigned from the structure analysis. The four main groups of haplotypes are demarcated by ellipses (see main text).