3. Materials and Methods

3.1 Data Sources

The Operational Land Imager (OLI) and Thermal Infrared Sensor (TRIS) of Landsat-8 satellite having 12-Band multispectral images of the 30-m resolution were downloaded from United States Geological Survey (USGS) Earth Explorer website. The multispectral images from the year 2015 to 2020 from the first week of October to the last week of November, just after the monsoon season is over as maximum landslides incidences in the Mandi district occurred during the monsoon season, were considered adequate due to the availability of cloud-free data. These high resolution multispectral images were already terrain corrected and were suitable for analysing the study area’s terrain characteristics. Advanced Land Observing Satellite-Phased Array Type L-band Synthetic Aperture Radar (ALOS-PALSAR) Digital Elevation Model (DEM) of 12.5m resolution was downloaded from the Alaska Satellite Facility website. This DEM was used to derive topographical information such as slope, curvature, aspect and drainage density etc., of the study area. A geological map of the study area was obtained from India’s Geological Survey (GSI) Bhukosh portal. The Mandi district’s soil map was generated using published Soil Map of Himachal Pradesh procured from the Indian Council of Agricultural Research-National Bureau of Soil Survey and Land Use Planning (ICAR-NBSS-LUP). The road network for the study area was developed from the published maps from the portal of Ministry of Road Transport and Highways (MoRTH). Table 1. Depicts various data sources used and their purpose in the present study.

3.2 Landslide Inventory Dataset

In the present study, landslide occurrences in the study area were detected and mapped using three sources (a) landslide data from Himachal Pradesh government disaster revenue reports and various documented sources such as Bhukosh portal of GSI (b) High resolution Google Earth images were used as an auxiliary data source and are analysed through visual interpretation to identify landslides and (c) through extensive field surveys using handheld GPS. The analysis of factors responsible for triggering landslides is necessary as it gives a relationship between various conditions that might be responsible for disrupting the stable slope conditions. (Crozier and Glade, 2003). The various triggering factors in study area are earthquakes, soil erosion, deforestation, mining infrastructure development etc. It was found that the most common triggering factor for all the landslides in the region was rainfall. Hence all landslides are treated as rainfall induced landslides in the present study. A total of 1723 landslides and their location, type and other required information were documented for developing a GIS based landslide inventory of the district. The landslide inventory was subdivided into a training set of 1199 landslides (70%) for landslide susceptibility analysis and a validation set of 524 landslides (30%) using random sampling in ArcGIS as shown in Figure 2.

3.3 Landslide Causative Factors

The interaction between geological, morphometric, topographical and hydrological factors in a region influences landslides’ occurrence. Hence the appropriate selection of these causative factors is a primary step in landslide susceptibility analysis ( Lee et al., 2004; Dou et al., 2015). In the present study, based upon the previous studies and expert opinions, eleven landslide causative factors namely slope gradient, plan curvature, aspect, elevation, drainage density, lineament density, geology, NDVI, soil, TWI and distance from the road were selected for the analysis. As there were variable scales of various causative factors, therefore all these were resampled into a raster format of 30 m resolution to commensurate the diversity for geoprocessing and map algebra analysis through ArcGIS.
The slope gradient is considered a significant factor that influences landslides in a particular area as it quantifies the amount of shear force acting on a specific area of sliding. (Saha et al., 2005; Pradhan and Lee, 2010). The slope gradient map prepared from the 12.5 m ALOS PALSAR DEM of the Mandi district is shown in Figure-3. The study area exhibits a considerable variation in slope gradients with slopes ranging from 0° to 82° which gives rise to flatter terrains along the valleys to extremely steep terrains along the mountains. The relief features of the region are extensively used in landslide susceptibility analysis as it affects rainfall, seismicity, etc. (Pham, 2017). The north eastern part of the Mandi district is dominated by a higher elevation which ranges from 3000 to 6000 m. The northern part of the Mandi district is surrounded by Beas river which is characterised by a lower elevation, including Balh Valley. The slope aspect of the area tends to be relatively evenly distributed in all directions, although southward directions have slight predominance. The plan curvature of the area is predominantly flat to slightly concave. The Plan Curvature of slope represents the direction of maximum slope and helps identify the morphology of the area’s topography (Erener and Düzgün, 2012; Pourghasemi et al., 2012). The drainage density, the measure of the density of streams and rivers in a drainage basin, directly influences slope’s erodibility dissected by channels and influences the surface runoff (Demir et al., 2014). The drainage density of the Mandi district ranges from 0 - 2.4 and is subdivided into five classes. Hydrological impact of drainage networks on the wetness/saturation of the soils on the slopes is assessed and quantified using Topographic Wetness Index (TWI) (Beven and Kirkby, 1979). The TWI map of study area was prepared by combined arithmetic application of morphometric variables, including the slope gradient and flow accumulation parameters, using Equation (1) with values ranging from 4.0 – 28.0. Normalised Difference Vegetation Index (NDVI) map is one of the most fundamental and widely accepted index to detect vegetation and landcover changes caused by infrastructural developmental activities. (Carrara et al., 1982; Ceballos and Lopez, 2003; Ahmad et al., 2013). Normalised Difference Vegetation Index (NDVI) map was prepared using image analysis techniques on high resolution Landsat-8 images of the study area using ERDAS IMAGINE software using Equation (2). The primary NDVI zone of the study area was shrubs and grasslands (45%), followed by sparsely vegetated (28%) and urban area (12%). Lineaments are the linear weaknesses or fracture planes such as cracks, faults, bleeding planes, joints etc. which represents the underlying geology of the area (Ramakrishnan et al., 2013). It was found that almost 35% of the total area had high to very high lineament density, and 26% of the total area had a moderate density of lineaments. Geological boundaries of an area are closely related to slope and rock strength, and such boundaries may lead to increased landslide activity. Mandi district lies within the lesser Himalayan and Siwalik region and was classified into five zones based on their slopes and terrain characteristics (Choubey et al., 2007; Dou, 2014). Different types of soils have different cohesion values, and the infiltrated water might be able to erode the soils with lesser cohesion values. (Mzuku et al., 2005; Godt et al., 2009; Baum, 2010). The Mandi district’s soil map was classified into five categories of soils based on their depth, drainage, and erosion properties. The construction of roads in mountainous terrain often includes excavation along the natural bed slope. This usually results in loss of support and cracks development due to increased strain in the upper soil mass. (Devkota et al., 2013; Pradhan et al., 2018) As a result, landslides occurrences are mostly distributed near the road network. (Ayalew and Yamagishi, 2005; Tuan and Dan, 2012). The road network distribution was reclassified into five road buffer zones from 0-500 m at 100 m intervals.
TWI = [Ln (As) / Tan (β)] (1)
Where Ln is natural log, As = Flow Accumulation, β = Slope in Radians
NDVI = (NIR – RED) / (NIR + RED) (2)
NIR (Near Infra-Red Band) and RED Band represent the electromagnetic spectrum’s spectral reflectance bands.

3.4 Statistical Landslide Susceptibility Models

Statistical bivariate models are widely accepted methods of quantitative analysis of landslide data. They generate a statistical relationship between the dependent variable (known landslides distribution) with a set of independent variables (landslide causative factors) to predict landslide susceptibility of an area. (Carrara et al., 1991; Chung et al., 1995; Guzzetti et al., 2006b; Rossi et al., 2010). In this study, landslide susceptibility analysis was carried out using three statistical models, i.e. Frequency Ratio (FR), Certainty Factor (CF) and Shanon Entropy. The data of all landslide causative factors were resampled into raster format of spatial resolution of 30 m. The accuracy of these models was validated by plotting the Receiver Operating Characteristics Curve (ROC) using Monte Carlo Simulation in the SDM tool in ArcGIS. The detailed methodology used in the present study is described using a flowchart as shown in Figure 4.

3.4.1 Frequency Ratio Model

Frequency ratio (FR) is an observation based statistical model representing the probability of occurrence of landslides for a given influencing parameter by correlating them with the existing distribution of landslides (Lee and Pradhan, 2007; Bonham and Carter, 2014). FR model associates the pixel data with and without landslides with pixels of input raster data layers of causative factors. The FR value is then computed for each class of particular causative factor using Equation (3). FR values greater than 1 indicate a higher proportion of landslide occurrence and a high correlation with that specific class of causative factor. On the contrary, FR value less than 1 indicates a lower correlation with that particular causative factor. (Akgun, 2008; Karim et al., 2011)
\(FR(i)=\frac{Npix(li)\ /\ Npix(ci)}{\sum Npix(li)\ /\ \sum Npix(ci)}\)(3)
Npix(li) = Number of pixels containing landslides in each class (i) of the causative factor
Npix(ci) = Total number of pixels in each class (i) of the causative factor
∑Npix(li) = Total number of pixels containing landslides in the study area
∑Npix(li) = Total number of pixels in the whole study area
To prepare final LSM maps, the FR values of all the landslide causative classes have to be integrated using Equation (4)
LSMFR = FR1+ FR2+ FR3+…+ FRn (4)

3.4.2 Certainty Factor Model

The Certainty Factor (CF) model is one of the most fundamental and widely accepted model for landslide susceptibility mapping. (Kanungo et al., 2011; Liu et al., 2014). It provides a rule based favourability function to consolidate heterogeneous data layers using Equation (5).
\(\text{CF}=\ \ \ \par \begin{matrix}\frac{\mathrm{\text{\ ppa\ }}-\text{pps}}{\operatorname{ppa}\left(1-\text{pps}\right)}&\mathrm{\text{\ when\ }}\text{ppa}\geq\text{pps}\\ \frac{\text{ppa}-\text{pps}}{\operatorname{pps}\left(1-\text{ppa}\right)}&\mathrm{\text{\ when\ }}\text{ppa}<\text{pps}\\ \end{matrix}\) (5)
Where CF is the Certainty Factor, ppa is the conditional probability of having a landslide event occurring in class ”a”, and pps is the prior probability of having the total number of landslides in the study area ”A”.
The CF value ranges between -1 and +1, where +1 is the measure of belief (definitely true) or increasing certainty of landslide occurrence and -1 is the measure of disbelief (definitely false) or decreasing certainty of landslide occurrence. CF value nearby 0 indicates that the conditional probability is very close to the prior probability, and hence it is difficult to ascertain any certainty of landslide occurrence (Lee and Talib, 2005; Pourghasemi et al., 2012e). The pairwise data layers of all landslide causative factors were combined based on ”Z” values obtained from Equation (6) (Dou et al. 2014; Ilia et al. 2015).
\(Z=\ \ \begin{matrix}A+B-AB&A,B\geq 0\\ (A+B)/(1-min(|A|,|B|))&A*B<0\\ A+B+AB&A,B<0\\ \end{matrix}\) (6)
These pairwise combinations have to be persistently performed on all causative factors using the integration rule until all data layers were combined to prepare final LSM maps.

3.4.3 Shanon Entropy Method

The entropy of a system conceptually measures the degree of randomness, disorder, uncertainty or instability. Thermodynamically as Boltzmann described, entropy is used to represent the direction of the spontaneity of a process. Claude Shannon in 1948 developed the concept of entropy to analyse a fundamental communication problem of information theory. The Shanon Entropy (SE) model can be used to measure the uncertainty in the information of various landslide causative parameters (Bednarik, 2010; Pourghasemi, 2012; Nohani et al., 2019). The probability density\(\ P_{\text{ij}}\)values for each class is calculated using Equation (7), which are further used to calculate the information coefficients \(H_{j}\) using Equation (8).
\(P_{\text{ij}}=\frac{\text{FR}}{\sum_{j=1}^{N_{j}}\text{FR}}\) (7)
where \(P_{\text{ij}}\) represents the probability density of each sub class, and FR represents the frequency ratio of each class.
\begin{equation} E_{j}=-\sum_{i=1}^{N_{j}}{P_{\text{ij}}\operatorname{}P_{\text{ij}}},j=1,\ldots,n\nonumber \\ \end{equation}
\(E_{\text{jmax}}=\operatorname{}N_{j},j=number\ of\ sub\ classes\)
\(H_{j}=(E_{\text{jmax}}-E_{j}/E_{\text{jmax}}),\mathrm{\ }H=(0,1),\mathrm{\ }j\mathrm{=1,\ldots,n}\)
\(W_{j}=H_{j}*FR\) (8)
Where \(E_{j}\) and\(\ E_{\text{jmax}}\) are the entropy values,\(H_{j}\)is the information coefficient, \(N_{j}\)is the number of classes in each landslide causative factor. \(W_{j}\) calculated from Equation (8) is the relative weight assigned to each landslide causative factor on the whole. \(W_{j}\)values closer to 1 represent higher uncertainty or inconsistency, whereas values closer to 0 represents higher certainty or consistency. The final LSM maps were prepared by using Equation (10)
LSMSE = \(W_{1}\)* FR1+\(\text{\ W}_{2}*\) FR2 +\(\text{\ W}_{3}*\)FR3 +…+\(\text{\ W}_{n}*\) FRn(9)