Landslide detection in mountainous forest areas using polarimetry and interferometric coherence

The cloud-free, wide-swath, day-and-night observation capability of synthetic aperture radar (SAR) has an important role in rapid landslide monitoring to reduce economic and human losses. Although interferometric SAR (InSAR) analysis is widely used to monitor landslides, it is difficult to use that for rapid landslide detection in mountainous forest areas because of significant decorrelation. We combined polarimetric SAR (PolSAR), InSAR, and digital elevation model (DEM) analysis to detect landslides induced by the July 2017 Heavy Rain in Northern Kyushu and by the 2018 Hokkaido Eastern Iburi Earthquake. This study uses fully polarimetric L-band SAR data from the ALOS-2 PALSAR-2 satellite. The simple thresholding of polarimetric parameters (alpha angle and Pauli components) was found to be effective. The study also found that supervised classification using PolSAR, InSAR, and DEM parameters provided high accuracy, although this method should be used carefully because its accuracy depends on the geological characteristics of the training data. Regarding polarimetric configurations, at least dual-polarimetry (e.g., HH and HV) is required for landslide detection, and quad-polarimetry is recommended. These results demonstrate the feasibility of rapid landslide detection using L-band SAR images.


Introduction
Landslides and slope failures induced by heavy rain and earthquakes caused two billion USD economic loss and ten thousand casualties in the last decade (International Federation of Red Cross and Red Crescent Societies 2016). Synthetic aperture radar (SAR) now plays a key role in rapid landslide monitoring with its cloud-free, wide-swath, day-and-night observation capability. In particular, interferometric SAR (InSAR) analysis is widely used for early detection, continuous monitoring, and risk assessment of landslides (Strozzi et al. 2005;Singhroy 2005;Scaioni et al. 2014).
However, due to significant decorrelation, detecting landslides in Eastern Asia by InSAR can be difficult, since many of them occur in mountainous forest areas. In early detection of the landslides induced by the 2018 Hokkaido Eastern Iburi Earthquake, the affected area was so vast that it took 7 days to obtain aerial photographs of the entire area (Geospatial Information Authority of Japan 2018). The InSAR coherence was low (Aimaiti et al. 2019) and could not contribute to early detection. In the landslides caused by the July 2017 Heavy Rain in Northern Kyushu (Geospatial Information Authority of Japan 2017), the coherence was similarly low and could not be used to assess risks such as landslide dams. A more comprehensive use of SAR data (not only InSAR coherence) is needed to contribute to the rapid recognition and risk assessment of landslides.
Polarimetric SAR (PolSAR) analysis of L-band SAR data is commonly used for landslide detection in forested areas. Although the use of several polarimetric parameters has been suggested, the best parameter remains unclear. Czuchlewski ( 2003), Shimada et al. (2014), Yonezawa et al. (2012), and Watanabe et al. (2016) employed eigenvalue parameters (Cloude and Pottier 1996) such as polarimetric entropy and alpha angle. Shimada et al. (2014), Shibayama et al. (2015), and Watanabe et al. (2016) also showed that polarimetric coherence between HH and VV polarization can be effectively used to detect landslides. Watanabe et al. (2012), Yonezawa et al. (2012), and Shibayama et al. (2015) used model-based decompositions (Freeman and Durden 1998;Yamaguchi et al. 2005) that decompose radar scattering power into easyto-understand scattering mechanisms, such as surface scattering, double-bounce reflection, and volume scattering. Most of the studies mentioned above basically applied a threshold to a polarimetric parameter to divide landslide areas and unchanged areas. Shibayama et al. (2015) pointed out that polarimetric scattering mechanisms at landslides vary drastically with the local incidence angle (LIA) of radar, and thus proposed a decision tree classification that uses different thresholds according to LIA.
Interferometric SAR (InSAR) analysis is also effective for detecting landslides. Many studies of this have applied the differential InSAR (DInSAR) technique, which estimates the line-of-sight displacement from the interferometric phase (Singhroy 2009;Barra et al. 2016;Bayer et al. 2017;Bozzano et al. 2017;Solari et al. 2020;Shi et al. 2020). However, the motion of rapid landslides discussed in this study cannot be estimated from the phase because of the strong decorrelation. The offset tracking technique can measure displacement larger than the InSAR can (Singleton et al. 2014;Sun and Muller 2016), but pixel matching fails where the terrain changes completely. Change detection using interferometric coherence is a promising approach, but Fujiwara et al. (2019) and Aimaiti et al. (2019) reported that coherence is difficult to apply in forested areas, probably due to the significant decorrelation. Instead of coherence, phase variance (also called phase standard deviation-a measure of noise deviation in an interferometric phase) is also a good indicator of landslides (Fielding et al. 2005;Fujiwara et al. 2019). But because phase variance requires a large window size (e.g., 200 looks in Fielding et al. 2005), it has only been used for rough estimates of landslide distribution with a low resolution.
This study address landslide detection in mountainous forested areas with high accuracy using both PolSAR and InSAR parameters, such as eigenvalue parameters, model-based decomposition components, polarimetric coherence, interferometric coherence, and phase variance. Terrain properties, such as slope and curvature, derived from a digital elevation model (DEM) were also used as ancillary data to precisely identify landslide areas. Although there have been many studies on landslide detection using PolSAR, InSAR, or DEM individually, few studies have combined all three analyses.
To identify landslides by combining multiple parameters, this study proposes decision tree classifications based on empirical rules and supervised classification using machine learning. The decision tree is a robust and computationally efficient approach, but it is difficult to integrate the many parameters derived from PolSAR and InSAR data. On the other hand, machine learning can easily handle many parameters and has the potential to detect landslides with high accuracy (Konishi and Suga 2018). Machine learning consists of two processes: building a complex model from a training dataset and making predictions from a new dataset.
Most of the previous studies presented the results from applying the methods to only one event. To better evaluate these methods, this research used quad-polarimetric (fully polarimetric) L-band SAR data by PALSAR-2 and quantitatively evaluated the detection accuracy for two landslide events, triggered by heavy rain and by an earthquake.

Test sites and data
The July 2017 Heavy Rain in Northern Kyushu brought record-breaking rainfall caused by a stationary rain front on July 5 and 6, 2017, in the Fukuoka Prefecture, Japan (Tsuguti et al. 2019). The 2018 Hokkaido Eastern Iburi Earthquake is Mw 6.6 earthquake occurred on September 6, 2018, in the eastern Iburi region of Hokkaido, Japan (Osanai et al. 2019). Both events caused a large number of landslides and slope failures in forested mountainous areas. Table 1 lists the PALSAR-2 data used in this study: the pre-and post-event data for each of the two test sites: the heavy rain in Fukuoka Prefecture (site F) and the earthquake in Hokkaido (site H). All the data are quad-polarimetric data available with HH, HV, VH, and VV polarization images. The perpendicular baselines that are crucial for reliable interferometric analysis are sufficiently small (less than 100 meters). Whereas, the temporal baseline of case H is 14 days (i.e., shortest baseline possible with this satellite), that of case K is very long, 336 days. The long baseline can increase false detection of landslides caused by the temporal decorrelation and land-cover changes such as deforestation. The original spatial resolution of the SAR data is approximately 6 m. We used the DEM from the Fundamental Geospatial Data provided by the Geospatial Information Authority of Japan (GSI), with 10-m resolution and approximately 5-m elevation accuracy. The DEM represents the groundlevel elevation and does not contain a forest canopy.
In this study, we also used the polygon data of the landslides occurred in the events for training the supervised classification and validating our results. The reference landslides polygon data were produced by the Geographic Department Disaster Countermeasures Group of GSI based on airborne and ground surveys (GSI 2017(GSI , 2018. Figure 1 shows a schematic model of the landslides discussed in this study. For simplicity, we regarded landslides as land-cover changes from forest to bare soil, where the scattering mechanism changes from volume scattering to surface scattering. It should be noted that the surface scattering from bare soil is difficult to observe when the local incidence angle (LIA) is large. From the principle of side-looking radar (having a look angle of around 30° in this study), LIA is small on the slopes facing toward the satellite, and large on the far side of the mountain.

Polarimetric analysis
First, we calculated polarimetric parameters from the PALSAR-2 data. Each pixel of measured PolSAR data can be represented as a scattering vector as follows: where S XY is the backscattering of each polarization under the assumption of radar reciprocity (S HV = S VH ). Polarimetric correlation between HH and VV polarization is a good indicator of surface scattering and can be obtained as follows: A scattering vector on the Pauli basis enhances the scattering mechanism even more and can be given by We obtained the coherency matrix from the averaged covariance of k P as follows: .
T includes both the power and correlation information of every scattering mechanism. For example, T 11 and T 33 approximately represent the powers of surface and volume scattering, respectively. The averaged alpha angle ᾱ was also derived from the eigenvalue analysis of T (Cloude and Pottier 1996) as follows: where λ i and u i are eigenvalues and eigenvectors of T, respectively. ᾱ corresponds to scattering mechanisms Fig. 1 Schematic illustration of polarimetric scattering mechanisms for the cases of small and large LIA, before (left) and after (right) landslides such as surface (α ~ 0), volume (α ~ π/4), and doublebounce (α ~ π/2) scattering.
To interpret T more physically, one can use modelbased decompositions (Freeman and Durden 1998;Yamaguchi et al. 2005). In this study, we used a further improved model  where the total backscattering power is decomposed into four components: surface scattering P S , double-bounce P D , volume scattering P V , and helix scattering P C .
Figures 2a-e and 3a-e show images of the polarimetric parameters in site F and site H, respectively. The landslide areas are seen in blue in Fig. 3a and c, indicating the presence of surface scattering. They also appear as high polarimetric coherence (Fig. 3d) and low alpha angle areas (Fig. 3e). Landslides in site F are less clearly visible in the images (Fig. 2a-e) due to the smaller and lesser landslide areas in this site. Figures 2i and 3i are optical satellite images shown only for visual comparisons, and are not used in further analysis.

Interferometric analysis
The SLC images of pre-event data were co-registered to post-event images for interferometric processing. The interferometric coherence γ (Figs. 2f, 3f ) and phase φ were calculated from HH-polarization image with 8 × 16 multi-looking as follows: Such a large multi-look size was necessary to obtain sufficient coherence in dense forests dealt in this study (Fujiwara et al. 2019). We compensated orbital and topographic component of the phase using the DEM. The phase variance (Figs. 2g, 3g) was calculated from the multi-looked and filtered phase (Goldstein and Werner 1998) as follows: where 〈〉 denotes the 5 × 5 pixel averaging applied by Fielding 2005 to obtain meaningful statistics from the phase, and φ is the mean phase. Note that phase is wrapped at ± π, φ needs to be unwrapped so that the deviation ( φ −φ ) is within ± π.
Landslide areas can be seen as low coherence area (Figs. 2f, 3f ) and high phase variation areas (Figs. 2g, 3g). Compared to coherence, phase variance corresponds better to the distribution of landslides (Fujiwara et al. 2019), but its resolution is degraded due to a large number of pixels being averaged in the calculation. The phase variation seems to be effective in a coarse estimation of landslides with lower resolution.

Terrain data analysis
We calculated three terrain parameters from the DEM: LIA, slope, and curvature. The LIA can be calculated as an angle between radar line-of-site vector and normal of terrain. Figures 2h and 3h show images of LIA with the layover areas overlaid in red color. To generalize our method, we used the first (slope) and second (averaged curvature) derivatives of DEM (Hasegawa et al. 2008) instead of the absolute value of elevation, which is highly site-dependent. All images derived from PolSAR, InSAR, and DEM data were resampled from the radar coordinate to the Universal Transverse Mercator projection with 10-m spatial resolution. Table 2 lists the landslide detection methods compared in this study. Methods #1 and #2 are simple thresholding of the images of γ and α, respectively. Method #3 applies three "AND" conditions for detecting landslides. Note that the threshold of P V in method #3 was originally 65%, but changed to 45% in this study because the original paper used a three-component model (Freeman and Durden 1998) that overestimates P V , whereas this paper used the four-component model instead . Methods #4 to #6 use different conditions for different ranges of LIA. We propose method #5 as a modification of method #4 to use the Pauli components (T XX ) instead of model-based decomposition for computational efficiency, and more case divisions with LIA for improved accuracy. The threshold values of method #5 are similarly defined as method #4. Method #6 added another AND condition of phase variation to further reduce the overestimation of landslides. Since phase variation is the low-resolution image suitable for coarse estimation of landslide areas, it is not suitable for use with strict thresholds. For this reason, we set lax threshold (0.05). We applied these classifications to both pre-event image and the post-event image in order to subtract preexisting bare soil areas (e.g., old landslides, deforested

Table 2 Landslide detection methods based on empirical rules compared in this study
This study (PolSAR) This study (PolSAR + InSAR) areas not related to the disaster) from the results of the post-event image.

Landslide detection by machine learning
To obtain higher accuracy by combining many parameters from PolSAR, InSAR, and DEM, we applied a supervised classification using a machine learning method-Random Forest (RF) (Breiman 2001). RF is a fast and accurate classifier that automatically constructs a large number of decision trees using training data (with the number of trees = 100 and number of layers = 3 being set).
In the classification, based on the results of the empirical approach described above, we used the Pauli components (T 11 , T 22 , T 33 ) from PolSAR-derived parameters; coherence and phase variation from InSAR parameters; slope, curvature, and LIA from DEM. For the Pauli components, we used post-event images (Figs. 2a, 3a) and differential images with pre-event images (Figs. 2b, 3b).
The first column of Table 4 lists the cases we tested with different combinations of parameters and test sites. In cases #1 to #8, we took training and test data from the same event, with half of the samples being used as training data and the other half being used for validation. To validate the generalization of our method, we also tested cases #9 and #10 where training and test data were taken from different events.
Since quad-polarimetric data are not always available from actual satellite operation due to its disadvantages (limited swath width and huge data volume), it is important to investigate the usability of reduced polarimetric data (Park and Lee 2019). Hence, we also examined the accuracies of dual-polarimetric (using only HH and HV polarization) and single-polarimetric (only HH) data in case #11 and #12, respectively.

Accuracy evaluation
The accuracy of landslide detection was validated by comparing our results with the reference data (GSI 2017(GSI , 2018 and computing accuracy indices, such as Cohen's Kappa coefficient (Congalton 1991), recall, and precision. Recall is the fraction of actual landslide areas that are correctly detected by SAR, and it can evaluate the underestimation of SAR results. Precision is the fraction of SAR-derived landslide areas that are actually landslides, and it can measure the overestimation of SAR results.
As the reference data of the landslides (black areas in Figs. 4e, 5e) based on manual interpretation of aerial  Table 3 lists the results from the decision tree classifications (or simple thresholding) based on empirical rules. Method #2 (Alpha angle) provided relatively good accuracy among the methods proposed in the previous studies (methods #1 to #4). Accuracies at a lower incidence angle (< 30°) are better than at a higher incidence angle (≥ 30°), particularly in site F. Given the weak surface scattering from landslide areas with high LIA (Fig. 1), it is more difficult to detect landslides. LIA dependency of the accuracy was more severe in site F due to its steeper terrain (i.e., larger variation of LIA) than in site H. Method #4 (model-based decomposition with case divisions) resulted in better accuracy in site H, but not in site F. Proposed methods #5 and #6 using Pauli components with more case divisions with LIA provided better accuracy, particularly in site H. These comparisons show that polarimetric parameters (especially alpha angle and Pauli components) are useful to detect landslides in forested areas. However, it should be noted that large LIAs and steep terrain will degrade accuracies because of the small surface scattering. Table 4 lists the accuracies of supervised classification applying machine learning to various cases. Cases #1 to #8 compare different combinations of the feature parameters used by the classifier. Cases #1 and #2 only used PolSAR data, but provided relatively good accuracy, especially case #2 (site H). Cases #3 and #4 with DEM did not improve accuracy, but cases #5 and #6 with InSAR did. Cases #7 and #8 used all three features and provided the highest accuracy of all, even at a higher LIA. These results illustrate the effectiveness of combining PolSAR, InSAR, and DEM.

Machine learning
Cases #9 and #10 are for the tests of site H obtained by the model trained using site F, and vice versa. Case #9 provided better accuracy than did the empirical approach described in the previous section. This shows that a classifier trained using one event can detect the landslides of another, unknown event. The poor accuracy in case #10 indicates significant degradation from overfitting for site H. The sediment rocks of Site H are covered with thick pyroclastic fall deposits from the volcanoes around the test site. The earthquake triggered many shallow landslides due to the slippery nature of the volcanic materials (Osanai et al. 2019). Since many landslides occurred on relatively gentle slopes (< 30°), the model trained using site H cannot be applied to site F, where a more typical type of landslide occurred on steep terrain. The results show that the model trained using site F is more generalized (i.e., applicable to both sites H and F) and possibly applicable to many other landslides. This suggests that machine learning has the potential for detecting landslides accurately, but its accuracy depends on the geological characteristics associated with the training data.
Cases #11 to #14 demonstrate the results of using quad-, dual-(HH and HV), and single-polarimetric (HH) data. It is natural that quad-polarimetry is the most accurate since it has more information on the scattering mechanisms. Single-polarization is not capable of landslide detection, especially at higher LIAs (in site H, κ = 0.189 by single-pol and 0.656 by quad-pol at LIA > 30°). Figure 1 shows that landslides in higher LIAs have small backscattering and that detecting them   Tables 2 and 3) agreed poorly with the reference data. This method assumes only a simple land cover change, from forest to bare land, and is less accurate due to the influence of other features such as buildings and crop fields). Figures 4b and 7b show that the supervised classifications (cases #7 and #8 in Table 4) are in good agreement with the reference data, since the model was trained with the actual data. Figures 4c and 7c show the supervised classifications using training data from another site (cases #9 and #10 in Table 4). It resulted in the significant underestimate of landslides for site F (Figs. 4c and 6c) due to the lower generality of training data as described above.
The lower accuracies of supervised classification using single-polarization data of cases #13 and #14 are shown in Figs. 4d and 7d. In Fig. 7d, the A areas have small LIAs and landslides could be detected even with single-polarization, whereas the B areas have large LIAs and they could not be detected because of the lack of HV (volumescattering) data. Figures 6 and 7 show that many small or narrow landslides (less than 20 m) went undetected, in particular, the small landslides in the northeastern part of site F (see also Fig. 4a-d). This underestimate arises from a lack of resolution. The fact that precision was higher than recall for many of the results shown in Tables 3 and 4 indicates there were underestimated small landslides. These results indicate that higher spatial resolution is needed for effective monitoring.
These results also have implications for the landslide observation strategies of the SAR satellite. Due to the long temporal distance, case #7 mistakenly detects a deforested area as a landslide, as shown in Fig. 4b. Observations must be made more often to eliminate temporal changes that are unrelated to the disaster. Moreover, due to the mountainous character of the test site, 4.5% of the pixels in site F and 4.2% of the pixels in site H are affected by layover. In this study, the layover areas are also used for landslide detection without masking, but this causes geometric errors in the results. More observations from different directions are necessary to minimize the effect of layover.

Conclusions
This study combined PolSAR, InSAR, and DEM data analyses to detect landslides induced by the July 2017 Heavy Rain in Northern Kyushu and by the 2018 Hokkaido Eastern Iburi Earthquake. The following conclusions were reached: (1) Among simple, conventional thresholding methods using polarimetric parameters, alpha angle and Pauli components were effective in detecting landslides. Since these methods assume changes in only simple land cover from forest to bare land, their accuracy is limited. In particular, large incidence angles and steep terrain will degrade accuracies.
(2) Supervised classification using machine learning provided higher accuracy, with a kappa > 0.6, especially when all features (InSAR, PolSAR, and DEM) were used. However, this scheme should be applied cautiously because its accuracy depends on the geological characteristics of the training data. (3) At least dual-polarimetry (e.g., HH and HV) is required for accurate landslide detection, particularly at large incidence angle, and quad-polarimetry is recommended. Higher spatial resolution, higher temporal frequency, and observations from more directions are also necessary to minimize errors in detection.
These results successfully demonstrated the feasibility of rapid landslide detection using L-band spaceborne polarimetric and interferometric SAR. The machine learning-based methods are particularly effective, but more training data should be collected for further improvements and generalization to overcome the overfitting problem. Our future investigation should also include the use of more information contained in polarimetric interferometry (PolInSAR) data, as this study only used the interferometry of HH polarization (Jung et al. 2018). Since satellites acquire single-and dual-polarization data more frequently, the integration of such partial polarization (but highly frequent) data is also one possible approach in the future.