Skip to main content

Numerical experiments on tsunami flow depth prediction for clustered areas using regression and machine learning models


Emergency responses during a massive tsunami disaster require information on the flow depth of land for rescue operations. This study aims to predict tsunami flow depth distribution in real time using regression and machine learning. Training data of 3480 earthquake-induced tsunamis in the Nankai Trough were constructed by numerical simulations. Initially, the k-means method was used to discriminate the areas with approximately the same flow depth. The number of clustered areas was 18, and the standard deviation of the flow depth data in a cluster was 0.46 m on average. The objective variables were the mean and standard deviation of the flow depth in the clustered areas. The explanatory variables were the maximum deviation of the water pressure at the seafloor observation points of the DONET observatory. We generated multiple regression equations for a power law using these datasets and the conjugate gradient method. Further, we employed the multilayer perceptron method, a machine learning technique, to evaluate the prediction performance. Both methods accurately predicted the tsunami flow depth calculated by testing 11 earthquake scenarios in the cabinet office of the government of Japan. The RMSE between the predicted and the true (via forward tsunami calculations) values of the mean flow depth ranged from 0.34–1.08 m. In addition to large-scale tsunami prediction systems, prediction methods with a robust and light computational load as used in this study are essential to prepare for unforeseen situations during large-scale earthquakes and tsunami disasters.

Graphical Abstract


An earthquake of magnitude 9.0 occurred in 2011 on the plate boundary at the Japan Trench (e.g., Ammon et al. 2011; Simons et al. 2011; Satake et al. 2013), which was associated with a gigantic tsunami. A tsunami warning was issued approximately 3 min after the earthquake was detected. The first tsunami warning for Iwate Prefecture predicted a tsunami of less than 3 m, but a detailed post-earthquake survey (Mori et al. 2012) for the area showed that the maximum tsunami run-up reached an elevation of 40.1 m. The second warning issued approximately 30 min after the first was much higher, but some areas could not receive the second and subsequent tsunami warnings owing to power outages or communication losses caused by the strong shaking. The 2011 Tohoku earthquake caused widespread damage, with approximately 23,000 dead or missing (Fire and Disaster Management Agency 2021).

Since the 2011 Tohoku earthquake, there have been many types of research and development aimed at improving the accuracy of tsunami early warnings. The first of these is the development of a large-scale seismic and tsunami observation system connected by submarine cables, called the Dense Ocean floor Network system for Earthquakes and Tsunamis (DONET; Kaneda et al. 2015), and a seafloor observation network for earthquakes and tsunamis along the Japan Trench (S-net; Mochizuki et al. 2018). DONET and S-net transfer seafloor observed data to the Japan Meteorological Agency (JMA) for tsunami monitoring and early warning. An innovative algorithm called tsunami forecasting based on inversion for initial sea-surface height (tFISH, Tsushima et al. 2009; 2012) has improved tsunami prediction accuracy using seafloor pressure data. This tFISH algorithm was already in operation for the tsunami early warning of the JMA in 2019. Maeda et al. (2015) and Wang et al. (2017; 2018) used assimilation methods to estimate the complete tsunami wavefield from discrete seafloor pressure data without estimating the initial sea-surface height. Koshimura (2017) developed a system to immediately predict tsunami inundation and anticipated damage using a real-time solution of fault motion from land-based Global Navigation Satellite System data. While these require high-speed and large-scale tsunami computations, the development of computers and tsunami computation software is one of the factors that now makes such methods possible (e.g., Musa et al. 2015; Baba et al. 2016). Other methods have been proposed to predict tsunamis in real time, such as database search models (Yamamoto et al. 2016), regression models (Igarashi et al. 2016; Yoshikawa et al. 2019), and deep learning models (Fauzi and Mizutani 2020; Makinoshima et al. 2021).

The tsunami forecasting system must be robust even in disasters when power supply and internet communications are lost. Therefore, the JMA duplicates the warning system in Tokyo and Osaka, and the system of Koshimura (2017) can prioritize the supercomputers of Tohoku and Osaka Universities in the event of a disaster. However, we should also develop alternative methods that do not require high-speed and large-scale computations as a contingency plan. The regression model (Igarashi et al. 2016; Yoshikawa et al. 2019), which uses the correlation between offshore and coastal tsunamis, is the most suitable for this purpose. Although the regression model would require computational resources to construct training data, once the regression equation is constructed, it can predict tsunamis using only a small number of observation values, such as maximum tsunami amplitude. The prediction of the regression model is quick. As a personal computer would be sufficiently practical for tsunami prediction using a regression model, it is easy to multiplex the prediction system. An earlier study (Igarashi et al. 2016) proposed a method for predicting coastal tsunami heights from the data of submarine cable systems using a Gaussian regression process. However, the Gaussian process is less accurate in extrapolation. Therefore, Yoshikawa et al. (2019) proposed a regression method using a power law based on offshore and coastal tsunami relationships. The power law regression showed almost the same performance as the Gaussian regression in the interpolation part and performed better in the extrapolation part.

However, Yoshikawa et al. (2019) predicted the tsunami height at only one point on the coast and did not obtain the spatial distribution of the maximum tsunami flow depth. Emergency response after a tsunami disaster requires information on the flow depth distribution in the damaged area, in addition to the tsunami height along the coast. To obtain these data using regression models, all the points in the inundated area are predicted, but the number of predicted points is enormous, and the processing time is too long. Hence, this study proposes a method to reduce the number of predicted points by pre-grouping the areas where the flow depths are always similar.

The analysis procedure was as follows. First, we calculated the tsunamis of 3480 cases in the Nankai Trough (Fujiwara et al. 2020) for training data to construct a regression model. Then, we applied cluster analysis to the flow depths of the training data to identify areas with similar flow depths for all tsunami events in the training data. Regression relationships were estimated using the conjugate gradient (CG), and the multilayer perceptron (MLP) methods, in which objective valuables are the average flow depths in the clustered areas, and the explanatory variables are the maximum ocean bottom pressure deviations during a tsunami at DONET stations. Finally, we used the constructed regression models to predict the tsunami flow depths calculated from hypothetical earthquake scenarios released by the Japanese government (Cabinet Office 2012) to evaluate the prediction accuracy.

Analysis methods

Training and test datasets

Fujiwara et al. (2020) proposed fault models with 3480 cases of interplate earthquakes in the study area (Fig. 1). We constructed the training data by calculating all the tsunamis generated by these 3480 fault models. Fault motion in each model caused seafloor displacement assuming a semi-infinite homogeneous elastic body (Okada 1985). We estimated the initial tsunami water level using the vertical component of the seafloor displacement, the effect of tsunami excitation by horizontal displacement of the seafloor slope (Tanioka and Satake 1996), and the filter of the linear potential theory (Kajiura 1963). Numerical tsunami simulations used the rise time of the initial water level at 60 s. The nonlinear long-wave equations solved by the staggered grid leapfrog difference method (Baba et al. 2015; 2016) estimated the tsunami propagation from the initial water level and tsunami run-up on land. The topographic data used in the tsunami calculations were obtained from the local government in the study area, i.e., Tokushima Prefecture. Topographic nesting consisted of five layers (Fig. 1b). The grid intervals in the layers were 810, 270, 90, 30, and 10 m from the coarsest layer to the finest layer in the study area (Fig. 1c). The tide level when the tsunami occurred was assumed to be the mean tide level (0 m in Tokyo Peil). Coastal tsunami defense structures smaller than the grid intervals, such as breakwaters, were modeled as line structures in the calculations. When a tsunami overtopped the coastal structures, we considered them to be collapsed structures and continued the calculation by excluding the line structures. The integral time was 6 h, so that the maximum tsunami waves arrived in all the evaluation areas. To satisfy the stability condition of the computation, the computational time step width was set to 0.1 s. Owing to the large computational load, we used a supercomputer (Earth Simulator, ES3) to perform the tsunami calculations. Each tsunami calculation took approximately 11 h using four ES3 nodes.

Fig. 1
figure 1

a Regional map of the study area. The dotted lines are the plate boundaries proposed by Bird (2003). b Tsunami computational area. Black rectangular areas indicate nesting layers. Red star shapes are the locations of the seafloor pressure gauges of the DONET observatory. Coordinate system is Japan plain rectangular coordinate system IV. c Tsunami prediction area. Color refers to clustered areas for the tsunami flow depth using the k-means method. Terrestrial contours indicate elevation, with an interval of 50 m

In 2012, the Cabinet Office of Japan re-investigated geological and geophysical features and historical interplate earthquakes in the Nankai Trough to construct earthquake scenarios (fault slip distributions), which may occur in the Nankai Trough. The basic procedure for constructing the earthquake scenarios was as follows. First, the fault plane was divided into a tsunami fault zone shallower than a depth of 10 km, and a main fault zone deeper than 10 km. The seismic moment of the main fault zone was calculated using a scaling law from the average amount of stress drop and the area of the main fault zone. Many sub-faults were defined (of approximately 5 × 5 km) on the fault plane, whose slip amount was calculated using the seismic moment and the difference in the plate convergence rate. The slip angle on the sub-faults was assumed to be in the opposite direction to the plate convergence angle; thus, mainly thrust motions. In addition, large slip and super-large slip patches were introduced to account for earthquake slip heterogeneity. The large-slip patch had a slip amount twice the average slip, accounted for 20% of the total area of the fault, and was located somewhere in the shallow half of the fault plane. The super-large patch had a slip four times larger than the average slip and was located in the tsunami fault zone, along the trench axis neighboring the large slip patch. Eleven earthquake scenarios were created with a magnitude of 9.1 (hereafter referred to as M9 scenarios) by changing the location of the large slip and the super-large slip patches (Additional file 1: Figure S1). This study calculated tsunamis from the M9 scenarios outlined above and used them as test data (Additional file 1: Figure S2).

Cluster analysis for tsunami flow depth

To reduce the number of prediction points, we created clustered areas where the tsunami flow depth was always similar among the 3480 fault cases. Fourteen fault models that resulted in significant tsunami inundation in the study area were randomly selected from the training data, and the k-means method (Hartigan and Wong 1979) was applied to their flow depth distributions. Of the 3480 cases, the selected fault model identification numbers were 101, 315, 884, 1562, 1596, 1816, 1838, 2125, 2512, 2645, 2668, 2725, 2842, and 2850 in Fujiwara et al. (2020). Ideally, it would be better to conduct the clustering analysis using all 3480 fault cases, but this was not possible because of the amount of memory required for clustering analysis using a large number of points. We repeated the clustering analysis several times by changing the selected models and confirmed that the obtained cluster patterns were similar.

In the k-means method, the analyst should specify the number of clustered areas in advance. This study determined the number of clustered areas based on the variance in flow depth in the clustered area. Additionally, a small number of clustered areas is preferable because the purpose of this study is to predict tsunami flow depths with low computational cost. We repeated the cluster analysis by changing the number of clustered areas generated and evaluated the standard deviation of the tsunami flow depth data in each clustered area. For convenience, we assumed a criterion that standard deviation of the flow depth data in each cluster must be less than 0.5 m with as small a number of clusters as possible. This criterion led to 18 clustered areas being the optimal case.

The obtained cluster classification showed good correlation with the topography (Fig. 1c). The clustered areas 12 and 13 appear around the river in the northern part of the map, and the clustered area varies following elevation in the southern part of the map. Some clustered areas are not spatially continuous, but discrete, because this study aims to reduce the number of predicted points by using a small number of clustered areas. Figure 2 shows the frequency percentage in the flow depth data for each clustered area for M9 scenario No. 3 (Additional file 1: Figure S2) as an example.

Fig. 2
figure 2

Histograms depicting the frequency percent of tsunami flow depth at each clustered area for the M9 scenario 3 (test data). Gray histograms were obtained by a forward tsunami calculation. Blue and red curves were predicted from seafloor pressure data using the CG and MLP methods, respectively

Regression models using a power law

Green's law, which states \(H{h}^{1/4}=const\), where \(H\) is the wave height and \(h\) is the water depth, can express the amplification of tsunami wave height under the linear tsunami theory. Hence, a linear multiple regression seemed reasonable for predicting coastal tsunami heights using data from multiple offshore tsunami stations. However, tsunamis contain strong nonlinear effects from advection terms and bottom friction near the coast, and these nonlinear effects are more likely to appear in large tsunamis. Yoshikawa et al. (2019) pointed out that multiple regressions with a power law are more suitable than linear multiple regressions. Hence, we also used multiple regressions with a power law for our tsunami prediction model. It should be noted that Green’s law correlates the tsunami height between coastal and offshore points. However, ocean bottom pressure gauges do not simply observe the tsunami heights for earthquakes at the gauges because the seafloor, seawater (tsunami generation), and the gauges move simultaneously owing to crustal displacement (Tsushima et al. 2012; Baba et al. 2014; Additional file 1: Figure S3). Hence, the pressure gauges observe the water pressure fluctuation (crustal movement + sea surface displacement). The water pressure remains almost unchanged during tsunami excitation under the assumption of hydrostatic pressure. The water pressure decreases with tsunami propagation, and the water pressure corresponding to the vertical component of the crustal movement decreases after a tsunami subsides. Therefore, this study used the absolute value of the maximum deviation of seafloor water pressure during a tsunami as the explanatory variable (\({\varvec{x}}\)). The tsunami prediction equation used in this study is as follows:

$$y_{i,j} = \sum\limits_{k = 1}^{n} {a_{i,k} x_{j,k}^{{b_{i,k} }} } ,$$

where y is the objective variable, which is the mean and standard deviation of the tsunami flow depths in the clustered areas. In other words, we predicted the shape of the frequency percentage of the tsunami flow depths in the clusters shown in Fig. 2; \(n=51\) is the number of offshore observation points (Fig. 1b); \(k\) is the observation point number; \(i\) is the objective variable number. Because there are two variables (mean and standard deviation) for each cluster, the total length of \(i\) is 36 (18 clusters × 2 variables); \(j\) is the fault case number, whose total length is 3480; and a and b are the regression coefficients to be estimated.

The CG method (Fletcher and Reeves 1964) estimated the regression coefficients (a and b) using all the training data from 3480 cases. The CG method solves nonlinear problems by employing an iterative process. Starting from arbitrary initial values, observational equations (thus, Eq. (1)) perform the predictions and the prediction errors are evaluated. The initial values are then slightly changed in the direction of error reduction. The iteration process is repeated until the solution is sufficiently convergent (error is not reduced any further). Herein, we used the function optim of the statistical processing software R. The CG method included the intercept (c), and gave 0 to a and c, and 1 to b as the initial values for iteration.

Multilayer perceptron

The regression analysis described above is a type of machine learning. In recent years, however, many research fields have used more advanced machine learning techniques. The field of tsunami prediction is no exception (Fauzi and Mizutani 2020; Makinoshima et al. 2021). The MLP method (e.g., Gardner and Dorling 1998) is a standard machine learning method that uses a mathematical model that mimics the neuron network structure in the human brain. It consists of several layers, which include an input layer, intermediate multiple layers, and an output layer. A layer has multiple nodes, and a node has a value that is given by a superposition of all node values of the previous layer using the following equation:

$$N_{j}^{i} = \left\{ \begin{gathered} f\left( {\sum\limits_{k} {W_{j,k}^{i} N_{k}^{i - 1} + c_{j}^{i} } } \right)\quad \quad \left( {1 \le i \le m} \right) \hfill \\ \sum\limits_{k} {W_{j,k}^{i} N_{k}^{i - 1} + c_{j}^{i} \quad \quad \left( {i = m + 1} \right)} \hfill \\ \end{gathered} \right.,$$

where \({N}_{j}^{i}\) indicates the \(j\)th node on the \(i\)th layer and \(i\)= 0 for the input layer, \(i\) = 1\(m\) for the intermediate layers, and \(i\) = \(m\)+1 for the output layer; \(m\) is the number of intermediate layers; \(f\) is the activation function, but which is not applied for the output layer (\(i\) = \(m\)+1); \(k\) is the node number of the previous layer (i.e., at i − 1). The number of nodes can differ among the layers. \({\varvec{W}}\) and c are the weight and bias, respectively, which are optimized to construct the prediction model. Input values are given to nodes in the input layer, \({\varvec{W}}\) and c are initialized at random, prediction is performed using Eq. (2), and \({{\varvec{N}}}^{m+1}\) is compared with the true value to obtain the prediction error. Using an error backpropagation method, \({\varvec{W}}\) and c are updated in the direction that reduces the prediction error. The prediction is performed again with the new \({\varvec{W}}\) and c, the error is re-evaluated, and \({\varvec{W}}\) and c are further updated using the error backpropagation method. This procedure repeats to find the optimal \({\varvec{W}}\) and c.

In addition to the power law regression in Eq. (1) solved using the CG method, this study predicted the tsunami flow depth for the clustered area using the MLP method with \({\varvec{y}}\) as the output layer (\({{\varvec{N}}}^{m+1}\)) and \({\varvec{x}}\) as the input layer (\({{\varvec{N}}}^{0}\)). We used Tensorflow libraries (Abadi et al. 2016) with the ReLU activation function. We repeated preliminary experiments of the MLP with different numbers of intermediate layers and nodes to evaluate the prediction error. The larger the number of layers and nodes, the lower the prediction error. We used the numbers of layers and nodes with which the prediction error did not decrease with further increases in the numbers. However, this method could lead to overfitting. For avoiding this, we used the Adam algorithm (Kingma and Ba 2015) to minimize the loss function with an L2 regularization term of the mean square error. A cross-validation method was used to determine the hyperparameter value of the regularization term. Finally, this study used 9 intermediate layers (\(m\)). The first intermediate layer (\(i\) = 1) had eight nodes (23), and the number of nodes in the following intermediate layers was set by successively increasing the exponent by one.

Prediction of test dataset

Here, we demonstrate the performance of tsunami flow depth prediction using the test data of the 11 M9 scenarios (Cabinet Office 2012), which are different Nankai Trough earthquake models from the training of 3480 cases. The absolute values of the seafloor water pressure deviations at 51 DONET stations from the M9 scenarios represent explanatory variables that were substituted into the prediction equations from the CG and MLP methods. Figure 3 shows scatter plots between the predictions using the two methods and the true values (i.e., forward calculation results from the 11 M9 scenarios). The MLP method (coefficient of determination, R2 = 0.977) was more accurate than the CG method (R2 = 0.938) in predicting the average flow depth. The R2 of the prediction of the standard deviation of the clustered flow depth data were calculated to be 0.880 and 0.958 for the CG and MLP methods, respectively.

Fig. 3
figure 3

Scatter diagrams between true values (forward calculations) and predicted values using (a) the CG method and (b) the MLP method. Red and blue circles represent the mean and standard deviation, respectively, of the flow depth in clustered areas

Figure 4 shows more quantitative comparisons of the prediction error using the root mean square of residual errors (RMSE) between the predicted and true value for each scenario. The CG method predicted the tsunamis based on the M9 scenarios with RMSE of between 0.39 and 1.75 m, while the MLP method predicted them with RMSE of between 0.34 and 1.08 m. The M9 scenarios 4 and 5 had lower prediction accuracy than the other scenarios. Figure 2 shows the normal distribution curves obtained by using the predicted results overlain on the true frequency percentages for the M9 scenario 3. Figure 5 compares the tsunami flow depth distribution calculated forwardly with one predicted using the CG and MLP methods.

Fig. 4
figure 4

Line graphs of the RMSE between true (forward calculations) and values predicted using (a) the CG method and (b) the MLP method for the M9 scenarios 1‒11. Solid and dashed lines indicate mean and standard deviation, respectively

Fig. 5
figure 5

Spatial map showing (a) tsunami flow depth distribution calculated forwardly from the M9 scenario 3; (b) and (c) are tsunami flow depth distributions predicted using the CG and MLP methods, respectively. Terrestrial contours indicate elevation, with an interval of 50 m


The CG method generates a prediction model using a linear summation of power law basis functions (Eq. (1)). Because the MLP method does not need to restrict the form of the basis function, the prediction is more accurate than the CG method. However, both methods showed poor predictive capability for M9 scenarios 4 and 5 (Fig. 4), which contain large slip and super-large slip patches off the coast of Shikoku (Additional file 1: Figure S1). This poor prediction may be related to insufficient training data because machine learning techniques generally require a large amount of training data. We increased the clustered areas from 18 to 30 to pseudo-increase the training data and applied the same procedure. However, the prediction accuracies of the CG and MLP methods for M9 scenarios 4 and 5 were not increased (Fig. 4).

The tsunami heights of the M9 scenarios of the test data were comparable to the largest tsunami of the 3480 cases of training data. Giant tsunamis possess strong nonlinearity and more complex propagation resulting in a greater difficulty in prediction. We may have needed training data with tsunamis much larger than those produced by M9 scenarios. Therefore, we reanalyzed the training data that included larger tsunami cases using the CG and MLP methods. The larger tsunami cases were generated by multiplying the slip amount by a factor of 1.5 in 268 relatively large cases among the 3480 earthquake cases. We repeated the same procedure as for predictions of the M9 scenarios, but the accuracy did not improve in this test (Fig. 4).

The pattern of the clustered areas may be inappropriate for the prediction because the clustered areas were generated using the training data, which were different from the test data of the M9 scenarios. Therefore, we performed a cluster analysis on tsunami flow depth distributions in the test data (Additional file 1: Figure S4). We reconstructed the predictive models of the CG and MLP methods using the new cluster areas and predicted the tsunamis of the test data. However, the prediction accuracy did not improve from that using the original cluster areas (Fig. 4). From this trial, the effect of cluster classification appears to be small, at least for the datasets herein.

However, our cluster classification of tsunami flow depths requires improvement. The clustering analysis should have used all 3480 cases of the training data. However, owing to the memory limitations of the analysis computer, we could only perform the cluster classification based on a limited number of training datasets (14 cases) selected at random. Although we confirmed that the results changed little by using different selected sets of 14 cases, there remains room for improvement. There is also no clear criterion to determine the number of clusters. In this study, the number of clusters was set based on the data variance within the clusters. This criterion should include both the data variance and an operational perspective. For example, it may be better to classify the clusters according to the administrative area classification, such as postal codes, to provide easy-to-understand tsunami information.

Increasing the number of training datasets and changing the cluster classification did not improve the accuracy. One possibility to further improve the prediction accuracy is the use of different characteristics in addition to the tsunami amplitude. The tsunami arrival time contains information on the direction of tsunami propagation, i.e., the location of the tsunami source. By using the arrival time as an explanatory variable, we can include information on the tsunami source location, which may improve the prediction accuracy. Additionally, a scheme that learns the full pressure waveforms may be helpful for further development. These investigations will be the subject of subsequent research.


An earlier study (Yoshikawa et al. 2019) proposed a method to predict the maximum tsunami height at a coastal point using multiple regression of power law equations. Here, we extended the method to predict the tsunami flow depth distribution on land areas. The MLP method successfully predicted the tsunami flow depth distribution on land areas, and the RMSE between the predicted and true values of the average flow depth was estimated to be in the range of 0.34 to 1.08 m. The MLP method was more accurate than the CG method because the former does not need to define the form of the basis function.

Although further improvements are needed, the methods presented here can construct a light and robust prediction system that does not require fast computation or a large database. While large-scale tsunami prediction systems are currently becoming mainstream, it is beneficial to have such a stand-alone prediction system to mitigate unforeseen circumstances during a great disaster. Furthermore, tsunami disasters are a global problem, but only a few countries have access to high-speed real-time computers for tsunami early warnings.

Availability of data and materials

We used the tsunami software JAGURS, provided in an online repository at Owing to the large size of the training and test data, we cannot upload them on online repositories. Interested readers can contact us directly to share the training and test data. The M9 scenarios were downloaded from the following website:



Conjugate gradient


Multilayer perceptron


Dense oceanfloor network system for earthquakes and tsunamis


Seafloor observation network for earthquakes and tsunamis along the Japan Trench


Japan Meteorological Agency


Root mean square of residual errors


Download references


We thank two anonymous reviewers whose constructive comments improved the manuscript. We thank the associate editor, Dr. Tatsuhiko Saito for editing. We thank Dr. H. Fujiwara for providing us with the 3480 fault models for the Nankai earthquakes. We thank Tokushima Prefecture for providing us with the dataset for the tsunami calculations. We used R statistical software and Python for data analysis and plotting. We also used GMT (Wessel et al. 2013) for data plotting. We conducted tsunami simulations on the Earth Simulator at the Japan Agency for Marine-Earth and Technology.


This work was supported by JSPS KAKENHI (Grant Number JP19H02409, JP22H01742).

Author information

Authors and Affiliations



TB designed the study. MK performed tsunami calculations and regression analysis. All authors interpreted the results of the analyses. TB and MK wrote the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Masato Kamiya.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1

: Figure S1. Fault slip distributions of the eleven M9 earthquake scenarios. Figure S2. Tsunami flow depth distributions calculated from the eleven M9 earthquake scenarios in the tsunami prediction area. These data were used as test data. Terrestrial contours indicate elevation, with an interval of 50 m. Figure S3. Schematic showing ocean bottom pressure changes owing to crustal displacements and tsunamis. Figure S4. Clustered areas obtained by the k-means method for the tsunami flow depth predictions using (a) 14 models selected from the training dataset and (b) eleven M9 earthquake scenarios. Terrestrial contours indicate elevation, with an interval of 50 m.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kamiya, M., Igarashi, Y., Okada, M. et al. Numerical experiments on tsunami flow depth prediction for clustered areas using regression and machine learning models. Earth Planets Space 74, 127 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: