Data cleaning
During the spatial data cleaning process, I removed duplicate records,
precision uncertainty over 10 km, imprecise coordinates (zero
coordinates, integers, records in oceans), and invalid coordinates that
the specified locality was incompatible with the coordinates given
(Chowdhury et al., 2021a, b, e) using the CoordinateCleaner package
(Zizka et al., 2019) in R.
To control sampling bias, I followed the spatial thinning approach.
Using the spThin R package
(Aiello-Lammens et al., 2015),
for each butterfly species, I considered occurrence records at 4.65 km
distant from each other, which means that there was only a single
occurrence record at 21.625 km2. The final dataset
contained 7,606 records for 285 species (Figure 1; supplementary table
S1).
Before fitting the model, I checked collinearity among the WorldClim
variables and removed highly correlated (r > 0.75)
variables (Zurell et al., 2020). I removed 11 variables (bio2, bio4,
bio5, bio6, bio7, bio8, bio10, bio12, bio13, bio16, bio19) and had 8
variables for the model fitting. I used the same eight variables for
future model predictions.