<Fig. 1>

2.1 Study area

The study area is located in southern Jiangsu Province of southeastern China, including three prefecture-level cities (Suzhou, Wuxi, and Changzhou) and two county-level cities (Danyang and Dantu), with a total land area of 18,700 km2 (Fig. 1a). The area has a subtropical monsoon climate with a mean annual rainfall of 900-1200 mm, and an average temperature of 15.6 °C (Song et al., 2019). Most parts of the study area are lowlands with elevations ranging from 0 to 10 m above sea level, while the mountains with elevations >100 m are mainly located in the southwest (Fig. 1b). Paddy soil (Stagnic Anthrosols) is the dominant soil type, and fluvo-aquic soil (Aquic Cambosols) and yellow brown soil (Ferri-Udic Argosols) are the other two major soil types (Fig. 1c). The parent materials for paddy soils in the area were mainly composed of alluvial materials, loess, and lacustrine deposits, while the fluvo-aquic soils in the northern parts of the study area were developed from the Yangtze River alluvium. Yellow brown soils derived from residual and slope deposits were mainly distributed in the low-mountain areas.
Southern Jiangsu Province has thousand years of agriculture planting history, while the rapid urbanization only began in the early 1980s. The rapid urbanization process and economic growth due to industrial development have greatly influenced the soil quality in the area (Liu et al., 2010).

2.2 Soil sampling and analysis

Cropland topsoil data (0-20 cm) in 1980 were collected from the soil survey reports of each city within the study area (Fig. 1a). These legacy data have been further checked for quality control, such as the correction of likely wrong samples and outlier detection (Schillaci et al., 2019), resulting in a total of 399 data on soil organic matter (SOM) being collected. The SOM in 1980 was determined by using the potassium dichromate oxidation method with external heating (Institute of Soil Science Chinese Academy of Sciences, 1978), and was converted to SOC by the SOM×0.58 based on the Van Bemmelen factor (Yan et al., 2011) (denoted as SOC1980 hereafter). Soil data recorded in these soil survey reports, including the text descriptions of the soil type, topography, bulk density, rock fragments with the fraction >2 mm (coarse fraction), and sampling locations, were the findings of the second National Soil Survey of China. So far, these legacy data were still the most comprehensive and detailed historical data that were available for extracting information about the soil properties of the study area in 1980.
In 2000, a total of 413 topsoil samples were collected for better coverage of the study area. These sampling locations were chosen to (i) match the 1980’s sampling locations as closely as possible, (ii) be on the identical soil types and topographic characteristics, and (iii) have the most typical cropping systems around the sites. In 2015, the sampling locations were determined by the same rules as the sampling campaign in 2000. However, because of the significant changes in land use (i.e., cropland converted to urban land over the period of 2000-2015), some sampling locations in 2015 were slightly different from those in 2000, resulting in a total of 407 soil samples being collected in 2015. Hand-held GPS (global positioning system, positioning error <10 meters) was used to record the sample locations in 2000 and 2015, and the related information such as soil types, land use, and topographic characteristics were also recorded in detail. The average sampling density over the three sampling campaigns is approximately one sample per 33 km2. And the SOC contents in 2000 and 2015, which were denoted as SOC2000 and SOC2015 hereafter, were all determined by the same method as SOC1980.

2.3 SOC change mapping and uncertainty assessment

The geostatistical ordinary kriging (OK) may reflect the mechanism of spatial variations in soil properties and is relatively transparent and straightforward, compared to the complicated algorithm such as machine learning (Veronesi & Schillaci, 2019). Therefore, the OK method was employed to map the spatial distributions of SOC1980, SOC2000, and SOC2015, and then, the spatial patterns of the SOC changes over the three time periods were determined by subtracting the OK-predicted SOC1980 from the OK-predicted SOC2000, the OK-predicted SOC2000 from the OK-predicted SOC2015, and the OK-predicted SOC1980 from the OK-predicted SOC2015, respectively.
During the OK prediction, the experimental variogram that summarized the spatial relations in SOC data was first calculated. For a set of SOC data z(xi ) at location xi(i=1,2,…n), the semivariance γ (h ) for a spatial distance h can be estimated by using the following equation (Veronesi & Schillaci, 2019):
(1),
where N (h ) is the number of pair observations z(xi ) and z(xi +h ) that separated by the spatial distance h . The experimental variogram can be fitted by some theoretical variogram models such as spherical, exponential, and Gaussian models. The sill of the fitted model is the semivariance value at which the fitted variogram model first flattens out (represents the spatial variance of the data), and the range is the distance at which the variogram reaches the sill (represents the distance at which the data are no longer auto-correlated), while the nugget is the model-fitted semivariance value when the distance is zero (represents the influences of measurement error and/or small-scale variations).
In this study, the variogram models were fitted to the experimental variograms of SOC1980, SOC2000, and SOC2015, respectively, and then the optimal model for each of the three sampling dates was chosen according to the goodness of fit (the value of R2) of spherical, exponential, and Gaussian models (Goovaerts, 1997). Sequential Gaussian Simulation (SGS) was used to generate equiprobable realizations of SOC for quantifying the uncertainties associated with the OK-predicted SOC. Based on the best-fitted variogram models from the OK method, the SGS was first conducted 1000 times for each of the SOC1980, SOC2000, and SOC2015 values, and then, random subtractions of the 1000 times SOC1980 maps from the 1000 times SOC2000 maps were used to obtain the 1000 equiprobable realizations of the SOC changes during 1980-2000. This process was repeated to generate the SOC change realizations for the periods of 2000-2015 and 1980-2015. Finally, the realizations of the SOC changes were used to quantify the uncertainty intervals of the SOC changes.
Ensure that the sample data are normal or approximately normal distribution, and the SGS algorithm used in this study was summarized as follows (Webster & Oliver, 2007):
  1. Model the variogram model of the SOC data.
  2. Define a random path visiting all grid nodes across the study area.
  3. For each un-sampled node, determine a searching window to obtain the conditional dataset (sample data and simulated values within the searching window) for the node.
  4. Estimate the mean and variance of SOC for the node location using kriging and the conditional dataset, and build the conditional cumulative distribution function (ccdf) of SOC based on the assumption of Gaussian behavior;
  5. Draw a simulated value randomly from the ccdf and add the simulated value to the conditioning dataset;
  6. The calculation proceeds to the next node until all grid nodes have been simulated.
Since the sampling locations in 1980 were recorded by text descriptions, some location errors may have occurred; thus, the assumption of a 500 m location description error was applied during the SGS of SOC1980 based on our field-work experiences in the area. Specifically, the SOC1980 sampling locations were first perturbed randomly within a buffer circle of 500 m in diameter, and then the SGS was conducted to generate 200 SOC1980realizations. A total of 1000 SGS-generated SOC1980realizations were obtained by repeating this procedure 5 times (the sampling locations in SOC1980 were randomly perturbed 5 times). Thus, the uncertainties resulting from the location description error of SOC1980 were considered in this study. While for SOC2000 and SOC2015, the SGS method was directly used.