Occurrence data and environment variables
The occurrence records of Ganges River Dolphin (GRD) across its entire
range consisting of GBM & KS river system which encompasses Nepal,
India and Bangladesh and the Indus Dolphin (ID) in Indus river system
which encompasses India and Pakistan were compiled. Occurrence locations
were based on presence records compiled from studies conducted by
several researchers (see supporting information SI1) and OBIS-SEAMAP
(http://seamap.env.duke.edu/).
A total of 724 occurrence records for GRD were compiled out of which 410
coordinates were used and 404 for ID out of which 304 coordinates were
used. Absence points are considered valuable for SDM algorithms and
model assessment techniques (Miller, 2010). Since I did not have true
absence points, I generated 10,000 absence points for modeling using
’random’ strategy. In this strategy, all cells of initial background are
pseudo absence candidates and the choice are made at random. For GRD,
two coordinates were discarded as they showed the presence in Bay of
Bengal. This is a phenomenon that has been reported during monsoons
(Moreno, 2003) however, for this study the area has been limited to
riverine environment.
The basin boundary and river networks were obtained from HydroSHEDS
(https://hydrosheds.org ).
The GBM basin provided by Hydrosheds has discarded the areas near the
Bay of Bengal and some areas within the basin boundary which were merged
to form the final GBM basin (see supporting information SI2). Since, the
species is aquatic, the input layers were created with environmental
variables clipped by river networks. This created the problem of NA
predictor variable for some coordinates maybe because of factors such as
coordinates reported in studies from shore based census or river network
error. So, a 1 km coordinate pull was used to drag the coordinates into
the nearest raster cell using nearestland function from the package
SEEG-Oxford/seegSDM. Any points which did not fall even after this 1 km
pull, were discarded. The coordinates so selected were again gridsampled
to match the raster resolution such that there was one occurrence point
per pixel.
19 bioclimatic raster layers were obtained from WorldClim version 2.1
climate data for 1970-2000
(https://worldclim.org/) along
with 2 hydrological variables - hydrologically conditioned Digital
Elevation Model and Flow Accumulation Model from Hydrosheds at
30-seconds spatial resolution to model potential distribution. Using all
the variables might cause the problem of over-fitting due to high
degrees of collinearity among predictors. To minimize this, Pearson
Correlation matrix was created and variables with correlation
>0.7 were discarded. In the end variables - BIO2, BIO3,
BIO15, BIO16, Flow accumulation and hydrologically conditioned DEM were
used. BIO2 or Mean Diurnal Range is the mean of monthly difference in
maximum and minimum temperature, BIO3 or Isothermality is a measure of
temperature seasonality, BIO15 or Precipitation Seasonality is a measure
of annual range in precipitation, BIO16 or Precipitation of Wettest
Quarter is the precipitation of the wettest quarter calculated per
pixel, Flow accumulation defines the amount of upstream area (in number)
draining into each cells and Hydrologically Conditioned DEM defines
expected flow of water over the terrain.