Data cleaning
During the spatial data cleaning process, I removed duplicate records, precision uncertainty over 10 km, imprecise coordinates (zero coordinates, integers, records in oceans), and invalid coordinates that the specified locality was incompatible with the coordinates given (Chowdhury et al., 2021a, b, e) using the CoordinateCleaner package (Zizka et al., 2019) in R.
To control sampling bias, I followed the spatial thinning approach. Using the spThin R package (Aiello-Lammens et al., 2015), for each butterfly species, I considered occurrence records at 4.65 km distant from each other, which means that there was only a single occurrence record at 21.625 km2. The final dataset contained 7,606 records for 285 species (Figure 1; supplementary table S1).
Before fitting the model, I checked collinearity among the WorldClim variables and removed highly correlated (r > 0.75) variables (Zurell et al., 2020). I removed 11 variables (bio2, bio4, bio5, bio6, bio7, bio8, bio10, bio12, bio13, bio16, bio19) and had 8 variables for the model fitting. I used the same eight variables for future model predictions.