Figure 4 . An example of identified active rock glacier (ID:
wkl037). (a) shows the contrasting wrapped phases between the landform
and surrounding background. The ALOS-1 PALSAR image pair generating the
interferogram were acquired on 14/11/2008 and 30/12/2008. (b) is the
corresponding Google Earth image presenting the geomorphic
characteristics of the mapped active rock glacier. The white arrow
indicates the direction of the movement, and the red dot marks the
location of reference point used for phase correction. This rock glacier
is debris-mantled slope-connected.
3.2 Automated mapping of rock glaciers using deep
learning
Deep learning is the computer algorithm based on neural networks that
are capable of determining functions to map from inputs to output (LeCun
et al. 2015). It has proved powerful in semantic segmentation by using a
convolutional neural network to progressively extract visual features at
different levels from input images (Mottaghi et al. 2014), which is
suitable for handling difficult mapping tasks as in the case of
delineating rock glaciers. Marcer (2020) first proposed a convolutional
neural network to detect rock glaciers from orthoimages and suggested
further development of this methodology. Robson et al. (2020) has
validated a new methodology to detect rock glaciers semi-automatically
by advanced image processing techniques including deep learning and
object-based image analysis, yet their method has not been used to
compile new inventories. Erharter et al. (2022) developed a framework
based on U-Net architecture to support the refinement of existing rock
glacier inventories. Among the open-source deep learning architectures
designed for semantic segmentation, we adopted the DeepLabv3+ with the
backbone of Xception71 (termed as DeepLabv3+Xception71 hereafter) as the
framework for us to develop the automatic mapping method (Chen et al.
2018) because of its outstanding performance demonstrated in the past
PASCAL VOC tests (the benchmark dataset for assessing performance of
semantic segmentation models, as detailed in Everingham et al. 2015) and
recent research applications to cryospheric remote sensing (Huang et al.
2020; Huang et al. 2021; Zhang et al. 2021a).
Development of the deep learning model for delineating rock glaciers can
be divided into three major steps: (1) preparing input data, (2)
training and validating deep learning network, and (3) inferring and
post-processing results, as detailed below. Figure 5 illustrates the
workflow and full details are provided below.
3.2.1 Preparing input data
The data preparation step aimed to produce a dataset of optical images
and corresponding rock glacier label images to feed into the
convolutional neural network. The input optical images were cloud-free
(cloud cover < 5%) Sentinel-2 Level-2A products (spatial
resolution ~10 m) covering the West Kunlun region
acquired during July and August of 2018. We pre-processed the images by
extracting the visible red, green, and blue bands and converting to
8-bit, so that the satellite images were in the same format as the
training datasets used for pre-training the DeepLabv3+ network we
adopted (Chen et al. 2018). To generate the label images, i.e., binary
rasters that have pixel values as 0 or 1, with 1 indicating rock
glaciers and 0 indicating the background, we used the ESRI Shapefiles of
the manually identified rock glaciers created in the InSAR-based mapping
process to label the Sentinel-2 images. We removed 118 rock glacier
samples from the training dataset because they are unrecognizable due to
cloud cover or relatively low resolution (10 m) of the Sentinel-2
images. In addition, we delineated 145 negative polygons, which are
similar-looking landforms such as debris-covered glaciers identified by
GLIMS and solifluction slopes based on our image interpretation, and
environments where no rock glaciers occur, e.g., water bodies and
villages. These negative polygons were used to produce negative label
images which constitute the input dataset along with the positive ones.
More negative samples were included during the iterative training and
validating process by adding the incorrectly inferred examples to the
negative training dataset for the next experiment. We extracted the
positive polygons with their surrounding background (a buffer size of
1,500 m) from the optical images to provide environmental information
and cropped these sub-images into image patches of sizes no larger than
480x480 pixels. Finally, we split the whole dataset of input image
patches by randomly selecting 90% of the data as the training set
(2,007 image patches) and the remaining 10% as the validation set (223
image patches).
3.2.2 Training and validating deep learning
network
Then we trained the DeepLabv3+Xception71 network with the initial
hyper-parameters (e.g., learning rate, learning rate decay, batch size,
number of iterations) suggested by Chen et al. (2018) and evaluated the
model performance on the training and validation datasets. The
evaluation was conducted throughout the training process by monitoring
the Intersection over Union (IoU) value, which is defined as:
IoU=TP/(TP+FP+FN)
where TP (true positive), FP (false positive), and FN (false negative)
are pixel-based. The mean IoU, which is calculated by averaging the IoU
of each class, is commonly adopted to indicate the accuracy of semantic
segmentation models. Our network classified each pixel of the optical
images into two classes, namely the rock glacier and the background. As
the amounts of pixels in the two classes are imbalanced (the rock
glacier class only occupies a small portion (~10%) of
the image patches), we only used the IoU value of the rock glacier class
to represent the model performance. We set 0.80 as the threshold: when
the IoU value of a trained model was lower than it, we increased the
size and diversity of the training dataset by performing image
augmentation (e.g., blurring, rotation, flip) on the positive samples
and including incorrectly inferred examples to the negative samples and
conducted a new experiment until obtaining a model with target IoU value
on the validation dataset and regarded the deep learning network had
been well trained. The IoU threshold 0.80 was selected considering the
validation mIoU (79.55%) of DeepLabV3+Xception71 on the Cityscapes
validation dataset, as detailed in Chen et al. (2018).
3.2.3 Inferring and post-processing
results
We applied the trained model to map rock glaciers from Sentinel-2 images
covering the West Kunlun. The input data occupied ~
0.6% of the total mapping area. To refine the inference results, we
excluded the predicted polygons smaller than 0.03 km2 due to the limited
spatial resolution of the Sentinel-2 images and the usual areal extent
of rock glaciers. Then we inspected each automatically delineated
landform and modified the boundaries when necessary. Finally, we
determined the same set of landform attributes as the InSAR-based
sub-dataset (Sect 3.1) and compiled the outputs produced by the two
methods into one inventory.