2.3 Modeling framework and evaluation
To ensure the accuracy of sample points and avoid oversampled or biased sampling, we used “Spatially Rarefy Occurrence Data for SDMs” tool to choose the locations (Fourcade et al., 2014; Brown et al., 2017). Eighty-three fossil sites and 116 recent macaque distribution locations were selected for this study (Figure S1 in the Appendix).
A procedure selecting the variables that are independent but closely related was completed byPrincipal Component Analysis (PCA), referring to the scores on the first three axes accounting for a significant part of the eigenvalue. TheKaiser-Meyer-Olkin (KMO) and Bartlett’s tests were applied to define whether a variable is suitable for PCA (Toll and Van Luit, 2013). The modeling was performed after the variables with low scores, among the highly correlated variables, had been removed on the axes with higher loading values.
Two different models corresponding to variable types (BC ,LU , and HP – Table S1 in Appendix) were established to conceive macaques’ suitable habitat distribution, referring to alternative climatic and environmental exponents and human population size – which have shaped and would drive their geographic distribution trajectories in the years to come.
BC models include a ) retrospectively reconstructing suitable habitat areas for LIG and LGM periods separately;
BC- LU- HP models include b ) the suitable habitat distribution scenario between 1970 and 2000; and c ) the future suitable habitats distribution scenarios in the 2050s.
Five models for four periods were analyzed with the MaxEnt model, with two main modifiable parameters - the Feature Class (FC) and theRegularization Multiplier (RM) - which can increase or decrease the model’s fit. Since the default combinations of these two parameters cause overfitting (Porfirio et al., 2014; Qiao et al., 2015), we used the R package ENMeval (Muscarella et al., 2014) to select an optimal combination of FC and RM. They were repeated ten times to generate the models’ operating characteristic (ROC) curves and obtain the mean area under the curve (AUC). They are then used to assess model accuracy according to AUC values ranging from 0 to 1. A value of 0.5 represents a random model (Myerson et al., 2001). The point on the ROC curve, the tangent slope, equals 1, corresponding to the maximized sum of sensitivity and specificity (maxSSS) (Cantor et al., 1999). Compared with other methods, maxSSS, used as the threshold, has higher sensitivity and credibility (Liu et al., 2016). Thus, we applied a point where the habitat is considered suitable, using the average of maxSSS for each model. Then we use the natural breaks (Jenks) method to classify appropriate regions into three grades – high, moderate, and lower suitable areas (Calka, 2018).
We also used the contribution percentage to find the dominant drivers influencing macaques’ distribution (Phillips et al., 2006; Zhang et al., 2019a).
As for the geo-ecological regions in mainland East Asia, five in number – Northwest, Southwest, Central, Coastal, and Northeast China (Huang et al., 2021; Zhang et al., 2022).