Results
All of the raw sequence reads, alignment files, and genotype datasets
will be deposited in the Open Science Framework database upon
acceptance, https://osf.io (DOI XXX). In total, 271 individuals were
subjected to the ddRADseq procedure and yielded an average of 2.08
million reads per individual. During the initial filtering, 36
individuals had a phred score of less than 25 and were removed from the
dataset. Additionally, 29 individuals were found to have more than 75%
missing data and were therefore removed. The final dataset included 206
individuals from 17 sites and contained
3612 SNPs. The population
structure inferred by fastSTRUCTURE that best explains the data is K =
5. Structure plots showing K = 3-6 can be found in figure S1. At K = 5,
most individuals (86%) were unambiguously assigned to one group
(98-100% assignment score; Fig. 1). Consistent with these results, the
PCA and DAPC grouped these individuals into five main clusters (Figs. 2a
& S2). The main difference being that the PCA further segregated one
cluster (blue, Fig. 2a) into two separate groups; east and west of the
Sierra Nevada mountain range. Further support for the same five clusters
was found in the maximum likelihood trees, with a high level of support
from each approximation method (Figs. 2b & S3).
The geographic distributions of four of these clusters closely align
with the distributions of four of the five subspecies described in Wirth
& Jones (1957) (Fig. 1), suggesting these morphological descriptions
accurately denoted species-level taxa within the C. variipenniscomplex. Further phylogenetic and morphological study is needed to
confirm the validity of these species groupings; however, for the
remainder of the manuscript we will refer to each cluster by a species
name. Culicoides occidentalis located in Western North America,C. sonorensis in the Western and Southern U.S., C.
albertensis in the Midwest U.S. and Ontario, C. variipennis in
the Eastern U.S. and Ontario, and a fifth genetic group suggesting the
occurrence of an additional, undescribed cryptic species in San Diego,
CA. Notably, eight of the 17 sites had more than one species in
sympatry, and one site had three species. At four sites, seven
individuals were assigned to two genetic groups with an assignment score
of ~50% for three individuals (scores = 45, 47 and
41%) and of ~25% for four individuals (scores = 34,
31, 25 and 24%), which suggests the occurrence of putative F1
or other types of hybrids (e.g. F2
or backcrosses), respectively. Interestingly, these hybrids were from
three different species parings (C. sonorensis X C.
occidentalis ; C. sonorensis X C. variipennis ; andC. albertensis X C. variipennis ). These hybrid individuals
also stood out using the PCA analysis, as they segregated between their
parental clusters (Fig. 2a), as well as at the base of each parental
branch in the phylogenetic tree (Fig. S3). In addition to these hybrids,
20 individuals had a secondary assignment score between 3% to 21%,
signifying potential introgression between those pairings.
The samples were then rearranged by species, rather than collection
site, and stricter filtering parameters were applied. This dataset
contained 566 SNPs from 199 individuals (hybrids were excluded) and was
used to calculate the species-level summary statistics as well as
determine the loci under selection. The mean FSTbetween the five inferred clusters was 0.7147 (0.6541-0.7470), roughly 9
times higher than the mean FST between the
populations (i.e., localities) within each cluster (see below; Tables 2
& S1). The overall dataset was further split into five datasets for
species-level population statistics. These datasets contained 22
individuals of C. albertensis from four populations (3423 SNPs),
36 individuals of C. occidentalis from four populations (2714
SNPs), 97 individuals of C. sonorensis from seven populations
(2357 SNPs), and 29 individuals of C. variipennis from four
populations (2960 SNPs). The expected and observed heterozygosity,FIS , and number of private alleles for each
species are reported in Table S2. No species level dataset was created
for the San Diego species as only one locality was examined.
When examining each species individually, C. albertensis, had no
evidence of population structure (K = 1), and had low genetic
differentiation among populations (mean FST =
0.054) (Fig. 3a; Table 2). Although there does seem to be a pattern of
isolation by distance, this was found to not be significant in this
species (P = 0.238). The low number of populations sampled potentially
limits our statistical power for this correlation. The results obtained
for C . occidentalis showed much more divergence compared
to the other species, with populations being strongly differentiated
from each other (mean FST = 0.411) (Table 2).
Additionally, fastSTRUCTURE suggests that each population of C.
occidentalis is a distinct genetic entity (K = 4) clustering by
location (Fig. 3b). While no IBD was found (P = 0.489), there seems to
be a considerable amount of geographic isolation among populations of
this species, with pairwise FST values ranging
from 0.14 to 0.70 (Table S3). Low genetic differentiation among
populations was found for C. sonorensis (meanFST = 0.029), despite a slight, but significant
IBD in this species (P = 0.039) (Fig. 3c; Table 2). For this reason, the
individuals from Colorado were combined into a single population, as
were the individuals from Kansas. A fastSTRUCTURE analysis suggested the
occurrence of population structure in C. sonorensis (K = 2), with
some individuals from Kansas belonging to a distinct group. The combined
Kansas populations were not divergent from any other C.
sonorensis population (Table S3). Populations of C .variipennis exhibited no evidence of population structure (K = 1)
or of isolation by distance (P = 0.587) (Fig. 3d). Consistently, almost
no genetic differentiation was found among populations of this species
(mean FST = 0.026) (Table 2).
We identified three outlier loci within the C. variipenniscomplex and an additional 23 species-specific loci: two in C.
albertensis , seven in C. occidentalis , 11 in C.
sonorensis , and two in C. variipennis . Each of these loci had a
log10 Bayes factor value over 1 and six had values above 2,
corresponding to a 95% and 99% confidence interval, respectively (Fig.
4). Searches of InsectBase were used to assign putative functional
annotations (most of which were provided by Nayduch et al. (2014), with
orthologous dipteran genes subsequently found using Flybase (Table S4).
Roughly 75% of the loci were matched to transcription data, and all but
one associated with a dipteran orthologous gene. None of the loci found
to have significant evidence of selection were shared across the
different species, suggesting that each is under its own set of
selective pressures.
Based on the COI gene, four distinct groupings were identified with
strong genetic divergence between groups (p- distance =
2.99-3.30%) and little divergence within groups (p -distance =
0.25-0.86%; Fig. 5; Table 3). Consistent with the SNP datasets, the
California population of C. occidentalis was separated
from the rest of its range. The mean percent divergence between the twoC. occidentalis groups (2.99%) was similar to its
divergence from the other species (3.01-3.30%). The San Diego
population clusters as a distinct group, with a similar level of
divergence from the other species (3.01-3.03%). Interestingly, C.
albertensis , C. sonorensis , and C. variipennis were not
separated from each other, and in some cases, C. albertensis andC. variipennis shared identical haplotypes (Fig. 5). Furthermore,
these three species exhibit a mean percent divergence between
individuals (0.80%) similar to the divergence observed among
individuals within the other clusters (Table 3). Other than the
grouping of C. occidentalis in California, there was no other
geographic clustering observed.