Materials and Methods
Bats were sampled during 2014 and 2015 in five isolated mountain ranges
across the Ethiopian Highlands: 1) Bale Mountains National Park, the
largest and southernmost massif that covers partially the most extensive
Afroalpine area in the continent (Hillman, 1986), 2) Arsi Mountains
National Park of Chillalo-Galama Mountains, 3) Guassa Community
Conservation Area, 4) Mount Abune Yosef, and 5) the Simien Mountains
National Park (Fig. 1). The Ethiopian Highlands include unique
ecosystems typically characterised by Afromontane forests up to 3200
masl, Afroalpine wet and dry grasslands above 3500 m and Afroalpine
moorland habitat above 4000 m. The Afromontane forest habitat includes
the endemic Ethiopian juniper (Juniperus procera ) and African
redwood (Hagenia abyssinica ) in the drier northern slopes and
Erica (Erica arborea ) and yellow wood (Podocarpus sp.)
forests in the warmer and more humid southern slopes. We obtained 3mm
wing biopsies from 50 Plecotus balensis bats from 19 sites
(Supplementary Table S1). Bats were caught in two distinct ecoregions:
Afromontane woodland (2795-3492 m; 30 bats) and Afroalpine grasslands
and moorlands (3773-4224 m; 20 bats). In Bale, a sufficient number of
bats was caught in each ecoregion type to separate the mountain range
into two populations, ‘Bale-D’ (Dinsho) for Afromontane woodland and
‘Bale-S’ (Sanetti Plateau) for Afroalpine moorlands. Because only three
individuals were caught in Chillalo-Galama Mountains, samples from this
mountain range could only be included in individual-based analyses
(ecological niche models, phylogenetic tree, haplotype network and
population structure), as well as the demographic history analysis as
part of the southern population. Three additional samples collected from
two sites around the Abune Yosef Mountain in a former expedition in 1996
were also included in individual-based analyses only.
Ecological niche
modelling
Ecological niche models (ENMs) were generated with the programme MaxEnt
v3.4.1 (Phillips, Anderson, & Schapire, 2006) to determine the
potential geographic distribution and ecological requirements ofP. balensis across the Horn of Africa. The Horn of Africa was
selected as the study area because it represents a biotic region that
was likely accessible for the species over the relevant time periods
covered by the modelling. As only 14 unclustered location records are
available for this under-studied species, we used a bias layer to
account for potential uneven sampling efforts and low sample sizes. We
assigned a value of 10 for mountain ranges surveyed during our
expeditions and a value of 1 to the remaining study area. These values
were used by Maxent to give weights to random background data during the
modelling process so that pseudo-absences better reflect potential
geographic bias in sampling efforts (Fourcade, Engler, Rödder, &
Secondi, 2014). Model resolution was set at ~1 km (30
arc sec). We generated two types of models. The full model included a
combination of climatic [downloaded from WorldClim
(www.worldclim.org/)], topographic [including altitude, slope and
ruggedness, calculated from SRTM map
(https://www2.jpl.nasa.gov/srtm/)], land cover [GlobCover2009 map,
European Space Agency
(http://due.esrin.esa.int/page_globcover.php) reclassified into 10
general categories], vegetation cover [Normalized Difference
Vegetation Index (NDVI) for the dry and wet seasons, MOD13A3
(https://lpdaac.usgs.gov/)], ecoregions (WWF ecoregions map;
Olson et al., 2001), and human footprint variables [Global human
footprint v2 2005, NASA
(http://sedac.ciesin.columbia.edu/data/set/wildareas-v2-human-footprint-geographic)].
The climatic model included only climatic and static topographic
variables that could be projected to the future (ruggedness and slope).
We removed highly correlated variables
[R>|0.75|; tested with ENMTools v1.3
(Warren, Glor, & Turelli, 2010)] and variables that did not
contribute to model gain (Supplementary Table S2 for final model
variables). Final models included a regularization value of 2 and three
features (linear, quadratic and product), selected based on AICc scores
obtained with ENMTools, and 1500 iterations. Model testing was carried
out with 100 bootstrap replications using 20% of data for model
testing. Output maps were converted to binary maps based on the
thresholding method that maximises sensitivity and specificity (Liu,
White, & Newell, 2013). Areas with values that fell above the threshold
were considered as suitable for the bat.
Models were hind-casted to the Last Glacial Maximum (LGM) and
mid-Holocene, and projected to the future (2070) using three General
Circulation Models (CCSM4, MIROC-ESM, MPI-ESM-LR) for each time period,
and two Representative Concentration Pathways (RCP) scenarios for future
projections only, the ‘worst case’ scenario, RCP +8.5
W/m2, and the more moderate RCP +4.5
W/m2 scenario (IPCC, 2013). In our projected models we
included the only climatic variable that affected the distribution ofP. balensis , maximum temperature (BIO5, WorldClim) and a static
topographic variable, topographic ruggedness (calculated from SRTM map).
Generating the genetic
datasets
Genomic DNA was extracted from all wing biopsy samples. The extraction
protocol consisted of DNA precipitation with isopropanol after
purification using saline precipitation. We selected three markers with
different mutation rates, covering events that occurred during the
Pleistocene (mitochondrial DNA fragments) versus recent-contemporary
impacts from the last few decades-centuries (microsatellites). Two
mitochondrial DNA (mtDNA) fragments were amplified: a 650 bp fragment of
the gene cytochrome b (cytb) using the primers MOLCIT-F (Ibáñez,
García-Mudarra, Ruedi, Stadelmann, & Juste, 2006) and MVZ16 (Smith &
Patton, 1993); and a 460 bp fragment of the hyper-variable region (HV1)
of the control region using the primers L15926 and CSBF-R (Wilkinson &
Chapman, 1991). The PCR reaction (20µl final reaction volume) included
0.5-5µl of DNA extract, 0.5µM of each primer, 2nM of
MgCl2, 0.2nM of dNTPs, 0.2mg BSA, and 0.5 units of
Taq-Polimerase. Thermo-cycling consisted of 4 min initial denaturation
at 94ºC followed by 35 cycles of 60s at 94ºC, 30s at 52ºC (for both
pairs) and 90s at 72ºC and a final extension of 10 min at 72ºC. The PCR
products were sequenced using ABI 3100 automated sequencers (PE
Biosystems, Warrington, UK) and DNA fragments were aligned and edited
using Geneious v. R11 software (Biomatters Ltd).
Samples were also genotyped for 19 polymorphic autosomal microsatellite
loci previously developed for the genus (Razgour et al., 2013). The
forward primer of each locus pair was labelled fluorescently with HEX or
6-FAM (Applied Biosystems), and microsatellites were combined into six
multiplex sets. Each 10μl PCR plex contained 2–4 primer sets (each set
at a concentration of 0.2μM and a total volume of 1μl), 5μl Qiagen
multiplex PCR master mix
and 2μl DNA. We used the following PCR program: initial denaturation at
95°C for 15 min, followed by 35 cycles of 94°C for 30s, 57°C for 90s and
72°C for 60s, followed by a final extension at 60°C for 30 min. PCR
products were sent for fragment analysis in Dnaseq (University of
Dundee, UK). Allele sizes were assigned using GeneMapper (Applied
Biosystems, USA). All loci retained in the analysis did not depart from
Hardy-Weinberg equilibrium expectations, were not in linkage
disequilibrium and had a low frequency of null alleles in most
populations [tested with GENEPOP v4.2 (Rousset, 2008) and CERVUS
v3.0.3 (Kalinowski, Taper, & Marshall, 2007)].
Power analysis
We tested the power of our genetic dataset to detect population
structure (assignment tests and connectivity) and identify a population
bottleneck using the Sample Planning Optimization Tool for conservation
and population Genetics (SPOTG; Hoban, Gaggiotti, & Bertorelle, 2013).
Assignment tests had very high power (0.98) with 20 individuals and two
populations and high power with 10 individuals and five populations
(0.86). Similarly, connectivity analysis had high power (0.87) even with
our minimum population size (7 individuals), while for 10 individuals,
power has increased to very high (0.97). The power to detect a recent
bottleneck varied depending on the extent of the bottleneck and sample
size. For a severe bottleneck, a sample size of 20 individuals had high
power (79%), while 10 individuals had insufficient power (27%). For a
moderate bottleneck, an analysis with 20 individuals had just below
minimal chance of success (46%), and 10 individuals had very low power
(20%).
Genetic data analysis
Mitochondrial DNA sequences from the cytb and HVI regions were aligned
and concatenated into a single sequence (1110 bp) with BioEdit v7.2.0
(Hall, 1999) and collapsed into haplotypes with DAMBE v6 (Xia & Xie,
2001). Bayesian phylogenetic trees were constructed for the concatenated
cytb and HVI sequences in MrBayes v3.2.1 (Ronquist et al., 2012), usingPlecotus austriacus as outgroup to root the tree. We ran
4x107 generations with four chains, sampled every
200th generation, and two simultaneous runs, discarding the first 25%
of trees as burn-in. We used the Hasegawa-Kishino-Yano (HKY) model of
DNA substitution with proportion of invariable sites model of rate
variation for the cytb sequences and the HKY model with
gamma-distributed rate variation for the HVI sequences [selected by
jModelTest2 v0.1.10 (Darriba, Taboada, Doallo, & Posada, 2012) based on
BIC values]. Trees and posterior probabilities were visualised with
Figtree v1.3.1 (http://tree.bio.ed.ac.uk/software/figtree/).
Parsimony haplotype network for the concatenated sequences was
constructed with NETWORK (v4.610, Fluxus Technology), employing the
median-joining network algorithm and the Greedy FHP distance calculation
method. We calculated nucleotide polymorphism, haplotype diversity,
genetic divergence and differentiation between populations in DnaSP
v5.10 (Librado & Rozas, 2009).
Population genetics summary statistics for the microsatellite dataset
were performed in GenAlEx v6.4 (Peakall & Smouse, 2006) and FSTAT
v2.9.3.2 (Goudet, 1995) correcting for differences in sample sizes.
Levels of inbreeding within each population were calculated using the
TrioML measure in Coancestry (Wang, 2011). Linear models were performed
in R (CRAN) to related levels of genetic diversity (allelic richness)
and inbreeding to NDVI during the dry season, human footprint index and
cover of key land-use types likely to either positively (forest) or
negatively (arable land) affect habitat quality for bats. Cover of the
land-use types was measured in ArcGIS v10.3.1 (ESRI) within 5 km buffers
around population capture locations using the GlobCover 2009 map. Values
of NDVI and human footprint index were averaged across the 5 km buffers.
We used 5 km buffer around capture sites to reflect the potential home
range of the species, estimated based on the home range size of its
better studied cryptic congener Plecotus austriacus (Razgour,
Hanmer, & Jones, 2011).
Extent of genetic differentiation between populations was calculated
using two measures: Fst in SPAGeDi (Hardy & Vekemans,
2002) and Jost’s D (Jost, 2008) in GenAlEx v6.4. Individual-based
Bayesian assignment tests implemented in STRUCTURE v2.3.3 (Pritchard,
Stephens, & Donnelly, 2000) were used to infer genetic population
structure, varying number of clusters (K) from 1 to 10, with 10
replicates and 106 Markov Chain Monte Carlo (MCMC)
generations following a burn-in phase of 5×105generations. The number of distinct clusters was determined using
STRUCTURE HARVESTER (Earl & vonHoldt, 2012) based on Evanno’s delta K
method and mean log-likelihood. Cluster assignment was visualised with
DISTRUCT (Rosenberg, 2003).
Landscape Genetics
analysis
We used the landscape genetics approach to determine how landscape
heterogeneity affects functional connectivity and gene flow between
mountain range populations of P. balensis . The analysis included
landscape variables that were deemed to impede or facilitate movement in
this bat given what is known about its habitat use and geographic
distribution, including topographic variables (altitude and ruggedness),
ecoregions, forest (percent tree cover), land cover (land cover and
NDVI), hydrology (distance to streams), and human impact variables
(human footprint index and night lights index). Landscape variables were
converted to resistance cost surfaces in ArcGIS and were assigned
different resistance costs based on knowledge of the ecology of this
species and its better-studied cryptic congeners. Resistance costs
ranged from one, no resistance to movement, to 100, strong barrier to
movement (see Supplementary Table S3 for list of variables, their
sources and different allocated resistance costs). Circuitscape v4.0.5
(McRae, 2006) was used to calculate resistance distance matrices between
populations based on the cumulative cost of movement due to landscape
resistance. We used multiple regression on distance matrices in the R
package ecodist (Goslee & Urban, 2007) to select the best combination
of resistance costs for each landscape variable based on strength of
correlation with genetic distance (Fst and Jost’s D)
between populations (Supplementary Table S4).
We compared nine candidate sets of hypotheses for the effect of the
landscape on genetic connectivity: topographic (altitude), land-use
(land cover map), anthropogenic (human footprint index), ecoregions (WWF
ecoregions map), forest (percent tree cover), hydro (distance to
streams), hydro-land (streams + land cover), anthropo-eco (human index +
ecoregions) and anthropo-topo (human index + altitude). We used the
Maximum Likelihood Population Effect (MLPE) approach (Van Strien,
Keller, & Holderegger, 2012) and the R packages lme4 (Bates, Mächler,
Bolker, & Walker, 2015) and usdm (Naimi, Hamm, Groen, Skidmore, &
Toxopeus, 2014). To reduce model collinearity we only included variables
with VIF<4 in each hypothesis tested. We compared evidence for
models being closest to the truth based on AICc and BIC evidence weights
(Van Strien et al., 2012). To account for the effect of geographic
distance on genetic connectivity, we divided the log measure of genetic
distance (Fst or Jost’s D) by log Euclidean distance
between populations. All variables were log transformed to comply with
model assumptions of normal distribution.
Approximate Bayesian computation (ABC) inference of
evolutionary
history
The demographic history of P. balensis was reconstructed using
the ABC approach implemented in DIYABC v2.1.0 (Cornuet et al., 2014) in
order to identify whether recent changes in population size have
occurred in response to anthropogenic land-use change and habitat
fragmentation and degradation. Small sample sizes meant we could not
compare the evolutionary history of each mountain range, but only the
wider geographical areas. Therefore, we divided the dataset into samples
from the north-west (n=28) versus south-east (n=22) of the Rift Valley
due to its role in population structure (see Results section). We tested
four competing scenarios of population changes: 1) a large ancient
population split into two smaller populations, no change in population
sizes; 2) population expansion after split; 3) after population split,
both populations declined recently; and 4) after population split recent
decline in the south-eastern population only (Supplementary Fig. S1 for
scenarios). Population split dates were kept flexible, ranging from
pre-post LGM (200-200,000 years ago), while recent decline dates were
set at 20-1000 years ago.
ABC analysis was carried out on the combined mtDNA and microsatellite
datasets, including only recent samples (from 2014-2015). MtDNA
substitution model and parameters were defined based on jModelTest2
results. We simulated 106 datasets per scenario
tested, and included most available summary statistics (18 in total).
The posterior probability of scenarios was estimated using a weighted
polychotomous logistic regression. We empirically evaluated the power of
the model to discriminate among scenarios (confidence in scenario
choice) by simulating pseudo-observed datasets with the different
scenarios and calculating false allocation rates. We evaluated model
specificity (type 1 error) by simulating 500 pseudo-observed datasets
with the scenario selected by the ABC analysis and calculating rates of
false scenario assignment. Model sensitivity (type 2 error) was
calculated based on the proportion of 500 pseudo-observed datasets
simulated with other scenarios that were assigned to the scenario
selected by the ABC analysis. We carried out model checking for the most
probable scenario through performing a PCA on 1000 simulated datasets
generated from posterior parameter distributions and the observed
dataset. We evaluated bias and precision on parameter estimation by
calculated the relative mean bias and Relative Median Absolute Deviation
(RMedAD) based on comparing 500 pseudo-observed simulated datasets to
10,000 simulated datasets closest to observed dataset of each parameter
(Cornuet, Ravigné, & Estoup, 2010; Cornuet et al., 2014).