Skip to main content

ICESat-2 single photon laser point cloud denoising algorithm based on improved DBSCAN clustering

Abstract

The Ice, Cloud, and Land Elevation Satellite-2 (ICESat-2) has great potential for development due to its advantages of the use of multiple beams, low energy consumption, high repetition frequency, and high measurement sensitivity. However, the weak photon signal emitted by the photon counting lidar is susceptible to the background noise caused by the sun and the atmosphere, which can seriously affect the processing and application of laser data. This paper proposes an improved DBSCAN clustering algorithm for denoising single photon laser point clouds in mountainous areas. Firstly, a grouping method based on elevation and distance statistics is proposed to reduce the influence of terrain undulations on denoising accuracy. Finally, an automatic radius search method is put forward to determine clustering radius of each group, automatically find the optimal radius, and improve the existing DBSCAN clustering method. The method proposed in this paper is compared with the classical DBSCAN algorithm. The results show that the proposed algorithm significantly improves denoising accuracy in mountainous areas and effectively filters out most background noise.

Graphical Abstract

1 Introduction

Spaceborne lidar is an emerging active remote sensing detection technology. It transmits a laser pulse with a certain frequency to the Earth’s surface and receives the scattered echoes, finally obtaining the accurate three-dimensional space coordinates of the laser footprint point. Spaceborne lidar has the ability to actively obtain global surface elevation and features a wide observation range (Neumann et al. 2019; Xie et al. 2021). Satellite altimetry data have broad prospects and applications in forestry remote sensing, polar glacier monitoring, and global change research. ICESat-2 (Ice, Cloud and Land Elevation Satellite-2) was launched by NASA (National Aeronautics and Space Administration) in 2018. ICESat-2 is equipped with ATLAS (Advanced Topographic Laser Altimeter System), which uses a 6-beam micro-pulse photon counting lidar for measurement (Zhu et al. 2020). It also uses a laser with a high repetition frequency, low energy, and a highly sensitive laser detector. The single photon energy in the echo signal is used to obtain the distance information of a long-distance space target with a lower laser pulse energy compared to other lidar technologies. ICESat-2 can solve the problems of large volume, large mass, low reliability, energy intensity and repetition frequency that are seen in traditional lidars.

Currently, previous studies have used ICESat-2 data to monitor topographic changes, ice and vegetation. Vernimmen et al. (2020) used ICESat-2 data to create the Global Coastal Lowland DTM (Digital Terrain Model), which is much more accurate than other existing global digital elevation models. Michaelides et al. (2021) used ICESat-2 data to estimate ground surface-height changes due to the seasonal freezing and thawing of the active layer, and discussed several influencing factors and the future potential of ICESat-2 data in permafrost applications. Chen et al. (2022) verified the coverage performance of ICESat-2 on global reservoirs and further explored its potential for monitoring long-term changes in reservoir water level and water storage. A variety of marine data products provided by ICESat-2 can be used to study the changes in snow depth, thickness and volume in polar sea ice, sea ice surface classification, and surface height on complex ice surface (Kacimi and Kwok 2022; Petty et al. 2021; Herzfeld et al. 2022). Mulverhill et al. (2022) assessed the consistency of Canadian canopy height estimates obtained by using forest canopy height products from ICESat-2 and the NTEMS (National Terrestrial Ecosystem Monitoring System) at various ecological gradients. The coherence between ICESat-2 and NTEMS datasets suggests potential integration. Many scholars have shown that ATL08 (Advanced Topographic Laser Altimeter System Level 08) canopy height is more suitable for relatively dense canopy environments such as coniferous forests and broad-leaved forests (Malambo and Popescu 2021). The quality of data acquired in winter is superior to that acquired in summer due to differences in vegetation structure and snow coverage (Zhu et al. 2022). The comprehensive evaluation of the quality of the ATL08 product can be useful for improving future versions of the product and for guiding the selection and use of ICESat-2 data (Tian and Shan 2021).

Because of the high sensitivity of the photon counting lidar, instrument and background noises impact on ATLAS data (McGarry et al. 2021). As a result, a large amount of noise is mixed into the observation results, which seriously restricts the use of ICESat-2. Therefore, denoising point cloud data quickly and efficiently have become a key issue for the extensive application of ICESat-2. The denoising algorithms for single photon point cloud data can be roughly classified into three types.

The first denoising algorithm is based on local statistical parameters, which is based on the local density information of photon point cloud data, and determines noise points according to the local density value or local density histogram of the signal points (Wang et al 2016). Zhu et al. (2018) proposed a local statistical analysis method for denoising point cloud data. By calculating the maximum density of each photon in each direction as its true density, the noise points are denoised out by using the empirical threshold. But the algorithm is sensitive to the grid size when calculating the photon density, thus affecting the denoising accuracy.

The second denoising algorithm is based on raster image processing. This algorithm, which is based on morphological information from point cloud signals, rasterize two-dimensional images of point cloud sections. Also, image processing technology is applied to eliminate noise points. Aiming at the difficulty in measuring feature length in point cloud data, Li et al. (2021) proposed to convert the three-dimensional point cloud data into a two-dimensional elevation grid map in order to estimate the geometric features, which simplifies the problem. However, signal photons may cause partial data loss after rasterization, losing effectiveness.

The third denoising algorithm is based on density spatial clustering. This method makes clustering analysis using the discrete distribution of noise photons in space, thereby eliminating noise. The main density clustering methods include Bayesian, Density-Based Spatial Clustering of Applications with Noise (DBSCAN), and ordering points to identify the clustering structure (OPTICS). Among the three methods, DBSCAN is one of the most widely used photon denoising methods and can be extremely effective in clustering signal photons. Zhang et al. (2021) proposed an effective algorithm for the extraction of signal photons from the weak beam data of ICESat-2 in mountainous areas. By using the slope–noise relationship in strong beam data to invert the track slope of the weak beam, an improved DBSCAN algorithm can be used to extract effective signal photons from weak beam data with a low signal-to-noise ratio. However, the classical DBSCAN algorithm is sensitive to the input parameters, so it has poor adaptability to different terrains. Inappropriate parameter settings in areas with large variations in terrain tend to cause excessive denoising, resulting in a lack of ground points that cannot reflect the surface morphology or filter out the noise points floating above the surface (Kui et al. 2023). Therefore, improving the accuracy of denoising is a key focus of research in altimetry observation.

Some scholars have proposed other types of denoising algorithms. An adaptive denoise that depends on the local slope is considered to be robust in the identification of signal points from points with high background noise, and appropriate for low-density data caused by slope (Xie et al. 2020). Zhang et al. (2022) proposed a noise removal algorithm with no input parameters based on the isolation of quadtrees, which can improve denoising accuracy and efficiency.

Due to the profile observation of the single photon laser radar system, the single photon point cloud data are distributed in a narrowband, which is different from the conventional point cloud data that can generate a large area of 3D topographic map. The denoising effect is typically not good enough in the area with large terrain fluctuations. Therefore, it is necessary to perform fine denoising on point cloud data to filter out noise points suspended above or below the ground. However, owing to uneven point cloud distribution in the signal, the algorithm using a single threshold cannot effectively remove a large amount of noise. Targeting the presence of many noise points in single photon point cloud data and the similarity of the height of many noise points to that of signal points, denoising ground surface point cloud data can be formulated as a classification task. Ground points are considered signals, whereas instrument noise, background noise, and non-ground points such as vegetation are considered noise. However, the traditional denoising method is insufficient to filter out a large amount of background noise at significant variations in topography.

This paper proposes a refined DBSCAN algorithm to avoid the manual selection of parameters during clustering, and improve the efficiency and accuracy of data processing. Our main contributions are as follows:

  1. (1)

    Proposal of a DBSCAN denoising method coupled with height and distance factor. The DBSCAN algorithm is improved, grouping areas with large variation in topography based on elevation and distance statistics and automatically calculating the clustering radius Epsilon (Eps) according to the distance between point clouds.

  2. (2)

    The influence of different elevation and distance parameters on the results is analyzed and the best parameters are selected.

  3. (3)

    The denoising results of the classical DBSCAN algorithm and the improved DBSCAN algorithm are compared, which reflects the correct detection effect of the improved DBSCAN algorithm on signal points.

The rest of this paper is organized as follows: Sect. 2 describes the experimental areas, data sources and methods. Section 3 describes the parameters selection and the experiments. Section 4 discusses the results and potential future work. Section 5 presents general conclusions, highlights the main contributions.

2 Materials and methods

2.1 Data sources

The ATLAS and its auxiliary system (GPS and spaceborne camera) carried by ICESat-2 measure the photon round-trip time and determine the spatial position of the surface reflecting the photon (Zhu et al. 2021). ATLAS uses a 6-beam photon counting lidar with higher spatial coverage, which is more than three times that of ICESat-1. Laser pulses generate three pairs of ground trajectories, each of which is usually about 17 m wide. The left/right points of each pair of ground trajectories are about 90 m apart in the direction across the track, about 2.5 km apart in the direction along the track, and about 3.3 km apart between columns, as shown in Fig. 1. Each pair of beams has different emission energies, i.e., weak light and strong light, and the energy ratio between them is about 1:4 (Neuenschwander et al. 2020).

Fig. 1
figure 1

Location of the experimental areas and ATLAS data trajectory

ATLAS is a lidar altimetry system based on micro-pulse photon counting. It emits and receives laser signals and records them at a band of 532 nm only, with a pulse width of about 1 ns. Since photon counting detects weak signals, in order to reduce the influence of background noise, the field angle of ICESat-2 is about 66 μrad (~ 40 m), smaller than that of ICESat-1 at the 532 nm band. The footprint diameter of ICESat-2 is only about 17 m, which is the average of each pointing. ICESat-2 has a laser emission frequency of up to 10 kHz and a footprint spacing of approximately 0.7 m along the satellite’s trajectory. Compared to ICESat-1’s 170 m footprint spacing, the sampling density is greatly improved while the official photon sampling accuracy can reach the centimeter level (Xing et al. 2020). As a new generation of multi-beam lidar, compared with ICESat-1, its observation mode has varied greatly. Table 1 shows the comparison of the two instrument parameters.

Table 1 Comparison of instrument parameters between ICESat-1 and ICESat-2

The ATL03 (Advanced Topographic Laser Altimeter System Level 03) Gt1l photon laser point cloud data collected by the ATLAS pulsed beam during the operation of ICESat-2 around mid-October 2020 were used in this experiment. Due to the large data span of the whole experimental area, the point cloud density in the data segment changes significantly. Therefore, this paper selects the point cloud data within the latitude range of 48.16° to 48.20° in the Daxing’anling experimental area, 36.18° to 36.22° in the Laoshan area and 46.52° to 46.56° in the Changbaishan area for denoising. As shown in Fig. 3, the Laoshan experimental area has a more complex terrain, with an elevation gap of about 600 m and numerous hills. The elevation difference of Changbaishan experimental area is about 150 m. Changbaishan is covered with rich vegetation. The elevation difference of Daxing’anling experimental area is about 350 m, and this area contains a large number of forests. The point cloud profile generated is shown in Fig. 2.

Fig. 2
figure 2

The idealized beam and footprint pattern of ATLAS, with deep blue representing strong light and pale blue representing weak light

2.2 Methods

2.2.1 Classical DBSCAN clustering method

DBSCAN is a classical density-based clustering algorithm (Ester et al. 1996). Its principle is as follows: firstly, an initial radius and a minimum number of neighborhoods are determined. Starting from any point in the data set, if there is more than the minimum number of neighborhoods within the radius distance from this point (including the original point itself), it is considered that they are all part of the “cluster”. Then, the cluster is expanded by checking all points, and whether they also have points exceeding the minimum number of neighborhoods within the radius distance. Then, the cluster is recursively expanded. Finally, points that exceed the minimum number of neighborhoods are added to the cluster. A new arbitrary point is selected, and the above process is repeated.

For a given dataset \(D=\{{x}_{1},{x}_{2},\dots ,{x}_{m}\}\), several concepts involved in the DBSCAN clustering algorithm are defined as follows:

Epsilon-neighborhood (\({N}_{Eps}\)): For \({x}_{j}\in D\), its \({N}_{Eps}\) contains the points in \(D\) whose distance from \({x}_{j}\) is not greater than Eps, namely \({N}_{Eps}\left({x}_{j}\right)=\{{x}_{i}\in D\left|dist({x}_{i},{x}_{j})\le Eps\}\right.\);

Core object: If the \({N}_{Eps}\) of \({x}_{j}\) includes at least MinPts (minimum points) samples, i. e. \(\left|{N}_{Eps}({x}_{j})\right|\ge MinPts\), then \({x}_{j}\) is the core object;

Density direct: If \({x}_{j}\) is located in the \({N}_{Eps}\) of \({x}_{i}\) and \({x}_{i}\) is the core object, \({x}_{j}\) is called density direct from \({x}_{i}\);

Density-reachable: For \({x}_{i}\) and \({x}_{j}\), if there is a sample sequence \({P}_{1},{P}_{2},\cdots ,{P}_{n}\), where \({P}_{1}={x}_{i},{P}_{2}={x}_{j}\) and \({P}_{i+1}\) are density direct from \({P}_{i}\), then \({x}_{j}\) is called density-reachable from \({x}_{i}\);

Density-based: For \({x}_{i}\) and \({x}_{j}\), if there is \({x}_{k}\) so that \({x}_{i}\) and \({x}_{j}\) are both density-reachable from \({x}_{k}\), then \({x}_{i}\) is called density-based with \({x}_{j}\).

The basic concept of the DBSCAN clustering algorithm is briefly described through four points \({x}_{1},{x}_{2},{x}_{3},{x}_{4}\) and their spatial distribution characteristics (Fig. 3). The dotted lines in Fig. 3 express Epsilon-neighborhoods. \({x}_{1}\) contains more samples than MinPts and is the core object. \({x}_{2}\) is density-reachable from \({x}_{1}\) directly, \({x}_{3}\) is density-reachable from \({x}_{2}\) directly and is density-reachable from \({x}_{1}\) indirectly. \({x}_{3}\) is density-based with \({x}_{4}\).

Fig. 3
figure 3

Original point cloud in the distance and elevation coordinate system: a the original point cloud data of latitude 48.16° to 48.20° in Daxing’anling experimental area; b the original point cloud data of latitude 36.18° to 36.22° in Laoshan experimental area; c the original point cloud data of latitude 42.52° to 42.56° in Changbaishan experimental area

2.2.2 Improved DBSCAN clustering method

To improve denoising accuracy and efficiency for point cloud data in large topographic gradients, we propose a revised DBSCAN method using elevation and distance statistics with an automatic radius search. The specifics of the method are as follows (Fig. 4).

  1. (1)

    Point cloud data grouping method based on elevation and distance statistics

Fig. 4
figure 4

DBSCAN basic concept diagram (MinPts = 3)

For mountainous forest areas with large topographic relief, a single Eps and MinPts parameter setting cannot simultaneously meet different terrain conditions in the experimental area. Therefore, the data of the experimental area are grouped to realize the adaptability of the parameters and improve the overall denoising accuracy. The grouping process is described as follows:

The original point cloud data are arranged along the track; the average elevation is calculated and second-order curve fitting is performed using the least squares method. According to Eqs. (1):

$$h=a{x}^{2}+bx+c.$$
(1)

Taking the fitting curve of the average elevation as a reference, the height difference \(h\) along-track direction is calculated and the height difference parameters \({h}_{a}\) and distance parameters \({l}_{a}\) are set as grouping conditions. Each group needs to meet the following conditions: \({h}_{i}\le {h}_{a}\) and \({l}_{i}\le {l}_{a}\) (where, \({h}_{i}{l}_{i}\) is the height difference and horizontal distance along the orbit of group \(i\)).

  1. (2)

    Calculating Eps parameter by group

After grouping, Eps is calculated in each group. In each group of raw data, the distance between any two points is calculated. According to Eqs. (2) to (5), the Euclidean distance, the maximum distances, the minimum distances and the differences between the maximum and minimum distances for any two points are calculated, respectively:

$$dist(i,j)=\sqrt{{({x}_{i}-{x}_{j})}^{2}+{({y}_{i}-{y}_{j})}^{2},}$$
(2)
$$max=MAX\left\{dist\left(i,j\right)\left|0\le i\le n,0\le j\le D\right.\right\},$$
(3)
$$min=MIN\left\{dist\left(i,j\right)\left|0\le i\le n,0\le j\le D\right.\right\},$$
(4)
$$Dist=max-min,$$
(5)

where D is a group of sample data sets, and n is the number of point clouds in the group of data sets.

The search radius (Eps) is further refined. Based on the difference between the maximum and minimum values of the distance, a statistical interval value is set. Using the minimum distance as the starting point and the maximum distance as the ending point, the distances between points in each section are plotted based on a histogram. The interval with the highest frequency appears as the Eps of this dataset, which is the search radius for clustering.

The statistical Eps are substituted into the DBSCAN clustering algorithm. MinPts is set based on the point cloud density and the size of Eps, combined with experience, to obtain the clustering results of a data set. The results after grouping and clustering are integrated to obtain the final denoising results. The points in the cluster are saved as signal points, and the points free from the point cloud cluster are denoised as noise points.

2.3 Accuracy evaluation

The confusion matrix of binary classification data is used to describe some common error indicators. These error indicators reflect the accuracy of point cloud denoising from different aspects.

Accuracy was evaluated by defining \(Accuracy\) and \(Sensitivity\). The equations are as follows:

$$Accuracy=\frac{TP+TN}{TP+TN+FP+FN},$$
(6)
$$Sensitivity=\frac{TP}{TP+FN},$$
(7)

where TP is the actual target signal, identified as the target signal. FN is the actual target signal, but identified as a non-target signal. FP is the actual non-target signal, but identified as target signal. TN is the actual non-target signal, identified as a non-target signal. \(Accuracy\) can reflect the overall effect of processing the algorithm on experimental data. A higher accuracy indicates a better classification effect. \(Sensitivity\) can reflect the classification effect of the algorithm for positive class. In this paper, denoising ground surface point cloud data is formulated as a classification task, where ground points are considered as signals and the remaining points are considered as noise. The reference signal photon obtained by visual interpretation is used as the actual target signal.

2.4 Study areas

The study areas are located in the Daxing’anling area in the northwest of Heilongjiang, the Laoshan area in Qingdao, Shandong Province and Changbaishan area in Antu County, Jilin Province, as shown in Fig. 5.

Fig. 5
figure 5

Schematic diagram of grouping to determine Eps parameters

The Daxing’anling Mountains are home to plenty of mountainous forests, which are well-preserved and the largest primitive forests in China. The Daxing’anling area is one of the important forestry bases in China, with complex and diverse forest features. The Daxing’anling Mountains range from the north of the Heilongjiang River in the northern part of Mohe City, Heilongjiang Province, to the upper reaches of the Xilamulun River in the northern part of Chifeng City, Inner Mongolia Autonomous Region. With a northeast–southwest trend, the mountains are more than 1,400 km long, with an average width of about 200 km and an altitude of 1,100 -1,400 m. The Daxing’anling area is covered by primitive forest, with the southern section being temperate hardwood forest and the northern section being cold temperate coniferous forest.

The Laoshan mountainous area, with a size of 446 km2, is the first peak along China’s coastline. The mountains are centered on Laoding and extend in all directions, with longer branches in the northwest and southwest directions. The Laoshan branch extends to the north of Jimo District along the east coast and to the Jiaozhou Bay in the west and the southwest branch extends to the city of Qingdao, forming more than ten hilltops and the ups and downs of the hilly terrain. Laoshan is more than 100 km long, with an altitude of about 600—1,100 m. Due to the complexity of the terrain and a wide variety of plants, various vegetation cover such as forests, shrubs, grasses, desert plants, halophytes, and agricultural cultivation have been formed.

Changbaishan is located in the southeastern part of Jilin Province. The northern and southwestern parts are bordered by Heilongjiang and Liaoning provinces, respectively. The southeastern and eastern parts border North Korea and Russia, respectively. The terrain of Changbaishan gradually slopes from southeast to northwest. The southeast is mainly a middle mountain range with an altitude of more than 1000 m, which gradually decreases to low mountains and hills in the northwest, until the platform with an altitude of about 300 m. The forest cover is 87.9%. It is a natural complex nature reserve with forest ecosystem as the main object of protection.

3 3 Results and analysis

3.1 Parameter selection analysis

In order to select the optimal parameters of height difference and distance, the accuracy of different parameters is calculated. As shown in Fig. 6, when the distance parameter \({l}_{a}\) is 150—500 m, the accuracy is higher than \({l}_{a}\) > 500 m. However, it is found in the verification that if the distance parameter \({l}_{a}\) < 300 m, it will lead to more groups, much workload and reduced efficiency. Considering the need to balance the work efficiency and high accuracy, the distance parameter \({l}_{a}\) is set to 400 m. The accuracy is above 0.9 when the height difference parameter \({h}_{a}\) is 50—80 m, although overall the highest accuracy was achieved at \({h}_{a}=80 m\). However, when grouping, the steepness of the terrain at some locations and the large drop in elevation causes the distance along the track for a group to become less than 50 m at \({h}_{a}=80 m\). The number of points in this group is low, which leads to an accuracy of less than 0.5 for this group, which can result in missing ground points so \({h}_{a}=50 m\) is set to group the original data.

Fig. 6
figure 6

Schematic diagram of parameter analysis: a accuracy changes for different \({l}_{a}\); b accuracy changes for different \({h}_{a}\)

For the grouped data, the Eps parameter is separately calculated in each group. Using the third ground of data in the Daxing’anling experimental area, the distance along the track is 300 to 416 m, encompassing a total of 441 points. For the determination of Eps of DBSCAN, by counting the distance between any point cloud data, the highest number of occurrences in an interval is chosen as the Eps. According to the statistics, the maximum distance between any two points is 285.4 m, and the minimum distance between any two points is 0.1 m. The statistical interval value is set to 3 m, and the distance between the point clouds counted as shown in Fig. 7. It can be seen that the distance between point clouds is the greatest at the intervals of 0 to 3 m. Therefore, the Eps of this group of data is 3 m, and MinPts is typically determined as an empirical threshold.

Fig. 7
figure 7

Histogram of distance statistics. (Taking the third group data of the Daxing’anling experimental area as an example.)

3.2 Evaluation of denoising results

3.2.1 A case study in Daxing’anling, China

As shown in Fig. 8, in the Daxing’anling area, the denoising effect of the improved DBSCAN algorithm is better than that of the classical DBSCAN algorithm and is basically consistent with the verified result. Most of the noise points far from the ground can be removed by the classical DBSCAN algorithm, but in the Daxing’anling area, which contains primitive forest areas, the vegetation near the ground is identified as signal points and not filtered out. However, the improved DBCSAN algorithm can not only remove the noise points far from the ground points, but also filter out the vegetation points near the ground, leaving only surface elevation. The surface morphology of the experimental area is clearly visible, but there are a few noise points that are not filtered out. Table 2 establishes the confusion matrix between the denoising results of the classic and improved algorithms and the verified results of the Daxing’anling area. The number of points identified as true signals after processing by the improved DBSCAN clustering algorithm is 7,623, of which the number of noise points identified as signals is 197. After processing with the classical DBSCAN clustering algorithm, the number of points identified as signals is 7,669, but the number of noise points identified as signals is 2,365, and the number of noises among signal points is reduced by 2,168. This indicates that the improved DBSCAN clustering algorithm effectively reduces the number of noise points. The improved DBSCAN obtained a denoising accuracy is 95.49% and the sensitivity is 97.42%.

Fig. 8
figure 8

Comparison of denoising results in the Daxing’anling area. a Denoising results using classical DBSCAN, b denoising results using improved DBSCAN, and c verification results from visual interpretation

Table 2 Confusion matrix of point cloud denoising results

3.2.2 A case study in Laoshan, China

As shown in Fig. 9, the improved DBSCAN algorithm produces an obvious clustering effect on point cloud data in the Laoshan experimental area, and can remove more than 90% of noise points. In the classical DBSCAN clustering algorithm, some of the signal points are misclassified as noise points, resulting in thin, discontinuous ground points. The vegetation in this area is mostly low shrubs, so some near-surface vegetation points are confused with ground points, thus affecting the noise classification accuracy. However, according to the confusion matrix of the denoising results and the verified results of the two algorithms established in Table 3, the improved DBSCAN clustering algorithm correctly identifies 3,703 signal points, with 92 signal points misclassified as noise points. Compared to the classical algorithm, 3,399 signal points were correctly identified after processing, while 396 signal points were misclassified as noise points. This indicates that the improved algorithm has shown a significant enhancement in the correct identification of signal points. The improved accuracy is 94.22% and the sensitivity is 97.58%.

Fig. 9
figure 9

Comparison of denoising results in the Laoshan area. a Denoising results using classical DBSCAN, b denoising results using improved DBSCAN, c verification results from visual interpretation

Table 3 Confusion matrix of point cloud denoising results in the Laoshan area

3.2.3 A case study in Changbaishan, China

In the Changbaishan area shown in Fig. 10, due to the large amount of data at this area, the classical DBSCAN algorithm misclassifies the noise points into signal points when encountering dense noise. From the confusion matrix of the denoising results established in Table 4, the number of misclassified points of the classical DBSCAN algorithm is 10228, and the number of misclassified points of the improved algorithm is 3063, indicating that the proposed DBSCAN algorithm improved the clustering of signal points and noise points (Table 4). The classification effect is effectively improved. The improved classification accuracy is 94.89% and the sensitivity is 96.50%.

Fig. 10
figure 10

Comparison of denoising results the Changbaishan area. a Denoising results using classical DBSCAN, b denoising results using improved DBSCAN, c verification results from visual interpretation

Table 4 Confusion matrix of point cloud denoising results in the Changbaishan area

4 Discussion

In the classical DBSCAN clustering algorithm, the threshold setting is limited by the terrain which is prone to excessive denoising (Zhang et al. 2021). Due to the single threshold setting, the noise points near the ground will be incorrectly recognized as signal points. The improved DBSCAN clustering algorithm solves the single threshold problem. By setting the grouping parameters appropriately and applying the automatic radius search, the parameters' adaptability is improved, thus enhancing the ability to extract signals from the ground. In the experiment of the Daxing’anling area, the difference between the two algorithms is more evident because the experimental area has a large virgin forest coverage. The classical DBSCAN clustering algorithm can only remove more obvious noise points away from the ground, and the vegetation points above the ground are misclassified as ground points. In the experiment of the Laoshan area, because the vegetation in the experimental area is mostly composed of low bushes and shrubs, the classical DBSCAN clustering algorithm has the problem of classifying low vegetation as ground points.. Due to the large amount of data in the Changbaishan area, the improved DBSCAN clustering algorithm can enhance the denoising effect in the area with dense noise points through the adaptive parameters. It has strong adaptability to single photon laser point cloud data similar to ICESat-2 data type, and can quickly and efficiently filter out noise points in point cloud data.

In our study, we found that the height difference parameter ha in the improved algorithm contributes more to the improvement of the denoising accuracy. Because after setting the height difference parameter grouping, the data can be grouped according to the terrain characteristics. Also different terrain areas can be avoided to have the same single threshold and result in low denoising accuracy. The distance parameter la is set mainly to avoid long distances along the trajectory of the grouping and a large amount of data for a single group in areas with gentle terrain. And a small la will result in more groups and lower work efficiency.

As a satellite with a global observation range, ICESat-2 can collect global ground elevation information (Xing et al. 2020). However, the global terrain is complex and changeable. For areas with large undulating terrain, point cloud denoising has low accuracy. In the follow-up study, the slope parameters in the ICESat-2 data should be used to improve the denoising method so that complex terrain can be processed (Hao et al. 2022). The improved DBSCAN clustering algorithm can filter out most of the noise points, but there is still confusion between the near-surface vegetation noise and the ground point signal since these are difficult to denoise via geometry alone. If the point cloud data are forcefully denoised under these conditions, the resulting point cloud data will be sparse, thus affecting the further application.

Our next steps involve conducting a more thorough exploration of signal classification within point cloud data. The goal is to improve the identification of near-ground noise points and ground points, thereby enhancing denoising accuracy. To achieve this, we plan to employ advanced machine learning and deep learning techniques, creating more intelligent and adaptive point cloud classification models (W. Li., et al., 2020). By conducting finer feature extraction and analysis of point cloud data, we aim to enhance the accuracy of identifying various ground features, such as buildings, vegetation, and terrain. Utilizing deep learning algorithms, we will seek to train models on large and diverse datasets to enhance their adaptability and generalization capabilities across different environmental conditions (Meng et al. 2022). Additionally, we will optimize preprocessing steps for point cloud data to better eliminate near-ground noise points. This optimization may involve more sophisticated denoising techniques, consideration of spatiotemporal features, and a more accurate modeling of laser beam propagation paths (Kui et al. 2023). We plan to integrate these optimization methods with the previously proposed improved DBSCAN clustering algorithm, forming a more comprehensive and efficient denoising framework (You et al. 2023). Furthermore, we will focus on addressing challenges in point cloud data processing in complex terrains and densely vegetated areas. Through an in-depth study of point cloud characteristics in these complex scenes, we will design targeted algorithms and strategies to effectively differentiate vegetation, ground surfaces, and other features, thereby enhancing classification accuracy and reliability (Liu et al. 2021). Finally, emphasis will be placed on integrating and comparing the proposed classification and denoising methods with ground validation data. In this paper, the results obtained from visual judgement reading are used as a validation data comparison, which still has some limitations. Through comparison with field observations, we can comprehensively evaluate the algorithm's performance and provide robust guidance for future improvements.

5 Conclusions

The improved DBSCAN algorithm proposed in this paper is aimed at the problem that the manual single threshold selection of DBSCAN clustering algorithm, which causes the algorithm to struggle with difficult terrain changes. The algorithm improves a grouping method based on elevation and distance statistics and an automatic radius search method. The improved algorithm reduces the number of manual attempts, enhances the automation of the denoising framework, and significantly enhances the efficiency of the clustering process. The improved algorithm is used to denoise the data in the experimental area, and good results with great improvements over the classical DBSCAN are achieved. The accuracy of the improved DBSCAN clustering algorithm can reach 95.49%, 94.22% and 94.89% in three experimental areas, respectively. These are 34.76%, 6.39% and 11.96% higher than the classical DBSCAN clustering algorithm, respectively. This indicates that the improved DBSCAN clustering algorithm has higher denoising accuracy. The sensitivity of the improved DBSCAN clustering algorithm can reach more than 96% in those experimental areas, indicating that the method is better for positive classification. For the problem of confusing the signals of near-surface vegetation points and ground points found in the experiments, considering the feasibility of different algorithms under different conditions and the accuracy of the results, various denoising algorithms can be used in combination to denoise various terrains. Multi-source data fusion processing can be considered as part of the denoising framework to improve the classification of point cloud signals, providing more valuable data support for the scientific application of ICESat-2 data.

Availability of data and materials

Publicly available datasets were analyzed in this study. ICESat-2 data were downloaded from the NASA National Snow and Ice Data Center (NSIDC) (https://nsidc.org/data/icesat-2).

Abbreviations

ICESat-2:

Ice cloud and land elevation satellite-2

NASA:

National aeronautics and space administration

ATLAS:

Advanced topographic laser altimeter system

ATL03:

Advanced topographic laser altimeter system level 03

ATL08:

Advanced topographic laser altimeter system level 08

DBSCAN:

Density-based spatial clustering of applications with noise

OPTICS:

Ordering points to identify the clustering structure

NTEMS:

National terrestrial ecosystem monitoring system

Eps:

Epsilon

MinPts:

Minimum points

References

Download references

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization, writing—original draft preparation and methodology were performed by WD and YJC. Writing—review and editing were performed by YJC., LFY and LQH.; software and validation were performed by YJC. Data curation was provided by LFY. Formal analysis was performed by WD and LQH. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Dong Wang.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, D., Yu, J., Liu, F. et al. ICESat-2 single photon laser point cloud denoising algorithm based on improved DBSCAN clustering. Earth Planets Space 76, 128 (2024). https://doi.org/10.1186/s40623-024-02071-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s40623-024-02071-y

Keywords