Figures
Figure 1. Geographic
distribution and structure plots for each collection site (black
squares) overlaid on the historical distribution of the species
described in Wirth and Jones 1957. The fastSTRUCTURE results are for 206
individuals inferred by 3612 SNPs and assuming five populations (K=5).
The vertical bars within each collection site represents an individual,
with each color representing a cluster. The putative species identity of
each clusters are as follows: Culicoides occidentalis (blue),C. sonorensis (teal), C. albertensis (yellow), C.
variipennis (red), and an unidentified population in San Diego, CA
(CASD) (green). The black bars above structure plot indicates an
individual for which the COI gene was also sequenced. The individuals
inferred to be hybrids are labeled h1-7.
Figure 2. (a) A 3D representation of the principal Component
Analysis (PCA) of all individuals included in the study. Each color
represents the cluster inferred from the structure analysis; C.
albertensis (yellow), C. occidentalis (blue), C.
sonorensis (teal), C. variipennis (red), and the unidentified
San Diego population (green). Hybrids (h1–h7) are designated with a
black circle and their inferred parental ancestry is depicted with pie
graphs. The geographic locations of the two C. occidentalisclusters are labeled next to each grouping (see table 1 for
abbreviation). (b)Unrooted maximum likelihood phylogenetic tree based on 199 individuals
inferred from 3612 SNPs (the hybrids were removed here but are included
in Fig. S3.). Clade colors represent the clusters inferred from the
structure analysis; C. albertensis (yellow), C.
occidentalis (blue), C. sonorensis (teal), C. variipennis(red), and the unidentified San Diego population (green). Support values
written on the branches: rapid bootstrap (%) / SH-aLRT support (%) /
ultrafast bootstrap support (%). For clarity, the values within each
cluster are not shown.
Figure 3. For each species, an independent SNP dataset was used
to calculate the best K using fastSTRUCTURE v.1.04 with the inferred
clusters denoted by varying shades. The IBD (shown as pairwiseFST by log geographic distance) for each species
were calculated in Genepop v.4.7.0. The individuals from San Diego, CA
are not included here as they were only found in a single population.
Figure 4. Loci under selection. Individual loci from the
“all-species” dataset (566 SNPs) and the species-specific datasets are
plotted against their corresponding log10 values. A log10 over 1.0 is
considered to have high support (95% CI) for being under selection with
a log10 value over 2.0 corresponding 99% CI for being under selection.
The individuals from San Diego, CA do not have a species-specific
dataset as they were only found in a single population, however, they
were still included in the “all species” analysis.
Figure 5. A haplotype network inferred by a median-joining
method, using 285 mitochondrial (mt) DNA sequences of the C.
variipennis complex from 27 states in the U.S. as well as British
Columbia and Ontario, Canada. The size of each circle represents the
frequencies of the haplotype. The 67 sequences obtained in the present
study, see figure 1, are colored according the clusters assigned from
the structure analysis. The four main groups of haplotypes are
demarcated by ellipses (see main text).