 Express Letter
 Open Access
 Published:
Numerical experiments on tsunami flow depth prediction for clustered areas using regression and machine learning models
Earth, Planets and Space volume 74, Article number: 127 (2022)
Abstract
Emergency responses during a massive tsunami disaster require information on the flow depth of land for rescue operations. This study aims to predict tsunami flow depth distribution in real time using regression and machine learning. Training data of 3480 earthquakeinduced tsunamis in the Nankai Trough were constructed by numerical simulations. Initially, the kmeans method was used to discriminate the areas with approximately the same flow depth. The number of clustered areas was 18, and the standard deviation of the flow depth data in a cluster was 0.46 m on average. The objective variables were the mean and standard deviation of the flow depth in the clustered areas. The explanatory variables were the maximum deviation of the water pressure at the seafloor observation points of the DONET observatory. We generated multiple regression equations for a power law using these datasets and the conjugate gradient method. Further, we employed the multilayer perceptron method, a machine learning technique, to evaluate the prediction performance. Both methods accurately predicted the tsunami flow depth calculated by testing 11 earthquake scenarios in the cabinet office of the government of Japan. The RMSE between the predicted and the true (via forward tsunami calculations) values of the mean flow depth ranged from 0.34–1.08 m. In addition to largescale tsunami prediction systems, prediction methods with a robust and light computational load as used in this study are essential to prepare for unforeseen situations during largescale earthquakes and tsunami disasters.
Graphical Abstract
Introduction
An earthquake of magnitude 9.0 occurred in 2011 on the plate boundary at the Japan Trench (e.g., Ammon et al. 2011; Simons et al. 2011; Satake et al. 2013), which was associated with a gigantic tsunami. A tsunami warning was issued approximately 3 min after the earthquake was detected. The first tsunami warning for Iwate Prefecture predicted a tsunami of less than 3 m, but a detailed postearthquake survey (Mori et al. 2012) for the area showed that the maximum tsunami runup reached an elevation of 40.1 m. The second warning issued approximately 30 min after the first was much higher, but some areas could not receive the second and subsequent tsunami warnings owing to power outages or communication losses caused by the strong shaking. The 2011 Tohoku earthquake caused widespread damage, with approximately 23,000 dead or missing (Fire and Disaster Management Agency 2021).
Since the 2011 Tohoku earthquake, there have been many types of research and development aimed at improving the accuracy of tsunami early warnings. The first of these is the development of a largescale seismic and tsunami observation system connected by submarine cables, called the Dense Ocean floor Network system for Earthquakes and Tsunamis (DONET; Kaneda et al. 2015), and a seafloor observation network for earthquakes and tsunamis along the Japan Trench (Snet; Mochizuki et al. 2018). DONET and Snet transfer seafloor observed data to the Japan Meteorological Agency (JMA) for tsunami monitoring and early warning. An innovative algorithm called tsunami forecasting based on inversion for initial seasurface height (tFISH, Tsushima et al. 2009; 2012) has improved tsunami prediction accuracy using seafloor pressure data. This tFISH algorithm was already in operation for the tsunami early warning of the JMA in 2019. Maeda et al. (2015) and Wang et al. (2017; 2018) used assimilation methods to estimate the complete tsunami wavefield from discrete seafloor pressure data without estimating the initial seasurface height. Koshimura (2017) developed a system to immediately predict tsunami inundation and anticipated damage using a realtime solution of fault motion from landbased Global Navigation Satellite System data. While these require highspeed and largescale tsunami computations, the development of computers and tsunami computation software is one of the factors that now makes such methods possible (e.g., Musa et al. 2015; Baba et al. 2016). Other methods have been proposed to predict tsunamis in real time, such as database search models (Yamamoto et al. 2016), regression models (Igarashi et al. 2016; Yoshikawa et al. 2019), and deep learning models (Fauzi and Mizutani 2020; Makinoshima et al. 2021).
The tsunami forecasting system must be robust even in disasters when power supply and internet communications are lost. Therefore, the JMA duplicates the warning system in Tokyo and Osaka, and the system of Koshimura (2017) can prioritize the supercomputers of Tohoku and Osaka Universities in the event of a disaster. However, we should also develop alternative methods that do not require highspeed and largescale computations as a contingency plan. The regression model (Igarashi et al. 2016; Yoshikawa et al. 2019), which uses the correlation between offshore and coastal tsunamis, is the most suitable for this purpose. Although the regression model would require computational resources to construct training data, once the regression equation is constructed, it can predict tsunamis using only a small number of observation values, such as maximum tsunami amplitude. The prediction of the regression model is quick. As a personal computer would be sufficiently practical for tsunami prediction using a regression model, it is easy to multiplex the prediction system. An earlier study (Igarashi et al. 2016) proposed a method for predicting coastal tsunami heights from the data of submarine cable systems using a Gaussian regression process. However, the Gaussian process is less accurate in extrapolation. Therefore, Yoshikawa et al. (2019) proposed a regression method using a power law based on offshore and coastal tsunami relationships. The power law regression showed almost the same performance as the Gaussian regression in the interpolation part and performed better in the extrapolation part.
However, Yoshikawa et al. (2019) predicted the tsunami height at only one point on the coast and did not obtain the spatial distribution of the maximum tsunami flow depth. Emergency response after a tsunami disaster requires information on the flow depth distribution in the damaged area, in addition to the tsunami height along the coast. To obtain these data using regression models, all the points in the inundated area are predicted, but the number of predicted points is enormous, and the processing time is too long. Hence, this study proposes a method to reduce the number of predicted points by pregrouping the areas where the flow depths are always similar.
The analysis procedure was as follows. First, we calculated the tsunamis of 3480 cases in the Nankai Trough (Fujiwara et al. 2020) for training data to construct a regression model. Then, we applied cluster analysis to the flow depths of the training data to identify areas with similar flow depths for all tsunami events in the training data. Regression relationships were estimated using the conjugate gradient (CG), and the multilayer perceptron (MLP) methods, in which objective valuables are the average flow depths in the clustered areas, and the explanatory variables are the maximum ocean bottom pressure deviations during a tsunami at DONET stations. Finally, we used the constructed regression models to predict the tsunami flow depths calculated from hypothetical earthquake scenarios released by the Japanese government (Cabinet Office 2012) to evaluate the prediction accuracy.
Analysis methods
Training and test datasets
Fujiwara et al. (2020) proposed fault models with 3480 cases of interplate earthquakes in the study area (Fig. 1). We constructed the training data by calculating all the tsunamis generated by these 3480 fault models. Fault motion in each model caused seafloor displacement assuming a semiinfinite homogeneous elastic body (Okada 1985). We estimated the initial tsunami water level using the vertical component of the seafloor displacement, the effect of tsunami excitation by horizontal displacement of the seafloor slope (Tanioka and Satake 1996), and the filter of the linear potential theory (Kajiura 1963). Numerical tsunami simulations used the rise time of the initial water level at 60 s. The nonlinear longwave equations solved by the staggered grid leapfrog difference method (Baba et al. 2015; 2016) estimated the tsunami propagation from the initial water level and tsunami runup on land. The topographic data used in the tsunami calculations were obtained from the local government in the study area, i.e., Tokushima Prefecture. Topographic nesting consisted of five layers (Fig. 1b). The grid intervals in the layers were 810, 270, 90, 30, and 10 m from the coarsest layer to the finest layer in the study area (Fig. 1c). The tide level when the tsunami occurred was assumed to be the mean tide level (0 m in Tokyo Peil). Coastal tsunami defense structures smaller than the grid intervals, such as breakwaters, were modeled as line structures in the calculations. When a tsunami overtopped the coastal structures, we considered them to be collapsed structures and continued the calculation by excluding the line structures. The integral time was 6 h, so that the maximum tsunami waves arrived in all the evaluation areas. To satisfy the stability condition of the computation, the computational time step width was set to 0.1 s. Owing to the large computational load, we used a supercomputer (Earth Simulator, ES3) to perform the tsunami calculations. Each tsunami calculation took approximately 11 h using four ES3 nodes.
In 2012, the Cabinet Office of Japan reinvestigated geological and geophysical features and historical interplate earthquakes in the Nankai Trough to construct earthquake scenarios (fault slip distributions), which may occur in the Nankai Trough. The basic procedure for constructing the earthquake scenarios was as follows. First, the fault plane was divided into a tsunami fault zone shallower than a depth of 10 km, and a main fault zone deeper than 10 km. The seismic moment of the main fault zone was calculated using a scaling law from the average amount of stress drop and the area of the main fault zone. Many subfaults were defined (of approximately 5 × 5 km) on the fault plane, whose slip amount was calculated using the seismic moment and the difference in the plate convergence rate. The slip angle on the subfaults was assumed to be in the opposite direction to the plate convergence angle; thus, mainly thrust motions. In addition, large slip and superlarge slip patches were introduced to account for earthquake slip heterogeneity. The largeslip patch had a slip amount twice the average slip, accounted for 20% of the total area of the fault, and was located somewhere in the shallow half of the fault plane. The superlarge patch had a slip four times larger than the average slip and was located in the tsunami fault zone, along the trench axis neighboring the large slip patch. Eleven earthquake scenarios were created with a magnitude of 9.1 (hereafter referred to as M9 scenarios) by changing the location of the large slip and the superlarge slip patches (Additional file 1: Figure S1). This study calculated tsunamis from the M9 scenarios outlined above and used them as test data (Additional file 1: Figure S2).
Cluster analysis for tsunami flow depth
To reduce the number of prediction points, we created clustered areas where the tsunami flow depth was always similar among the 3480 fault cases. Fourteen fault models that resulted in significant tsunami inundation in the study area were randomly selected from the training data, and the kmeans method (Hartigan and Wong 1979) was applied to their flow depth distributions. Of the 3480 cases, the selected fault model identification numbers were 101, 315, 884, 1562, 1596, 1816, 1838, 2125, 2512, 2645, 2668, 2725, 2842, and 2850 in Fujiwara et al. (2020). Ideally, it would be better to conduct the clustering analysis using all 3480 fault cases, but this was not possible because of the amount of memory required for clustering analysis using a large number of points. We repeated the clustering analysis several times by changing the selected models and confirmed that the obtained cluster patterns were similar.
In the kmeans method, the analyst should specify the number of clustered areas in advance. This study determined the number of clustered areas based on the variance in flow depth in the clustered area. Additionally, a small number of clustered areas is preferable because the purpose of this study is to predict tsunami flow depths with low computational cost. We repeated the cluster analysis by changing the number of clustered areas generated and evaluated the standard deviation of the tsunami flow depth data in each clustered area. For convenience, we assumed a criterion that standard deviation of the flow depth data in each cluster must be less than 0.5 m with as small a number of clusters as possible. This criterion led to 18 clustered areas being the optimal case.
The obtained cluster classification showed good correlation with the topography (Fig. 1c). The clustered areas 12 and 13 appear around the river in the northern part of the map, and the clustered area varies following elevation in the southern part of the map. Some clustered areas are not spatially continuous, but discrete, because this study aims to reduce the number of predicted points by using a small number of clustered areas. Figure 2 shows the frequency percentage in the flow depth data for each clustered area for M9 scenario No. 3 (Additional file 1: Figure S2) as an example.
Regression models using a power law
Green's law, which states \(H{h}^{1/4}=const\), where \(H\) is the wave height and \(h\) is the water depth, can express the amplification of tsunami wave height under the linear tsunami theory. Hence, a linear multiple regression seemed reasonable for predicting coastal tsunami heights using data from multiple offshore tsunami stations. However, tsunamis contain strong nonlinear effects from advection terms and bottom friction near the coast, and these nonlinear effects are more likely to appear in large tsunamis. Yoshikawa et al. (2019) pointed out that multiple regressions with a power law are more suitable than linear multiple regressions. Hence, we also used multiple regressions with a power law for our tsunami prediction model. It should be noted that Green’s law correlates the tsunami height between coastal and offshore points. However, ocean bottom pressure gauges do not simply observe the tsunami heights for earthquakes at the gauges because the seafloor, seawater (tsunami generation), and the gauges move simultaneously owing to crustal displacement (Tsushima et al. 2012; Baba et al. 2014; Additional file 1: Figure S3). Hence, the pressure gauges observe the water pressure fluctuation (crustal movement + sea surface displacement). The water pressure remains almost unchanged during tsunami excitation under the assumption of hydrostatic pressure. The water pressure decreases with tsunami propagation, and the water pressure corresponding to the vertical component of the crustal movement decreases after a tsunami subsides. Therefore, this study used the absolute value of the maximum deviation of seafloor water pressure during a tsunami as the explanatory variable (\({\varvec{x}}\)). The tsunami prediction equation used in this study is as follows:
where y is the objective variable, which is the mean and standard deviation of the tsunami flow depths in the clustered areas. In other words, we predicted the shape of the frequency percentage of the tsunami flow depths in the clusters shown in Fig. 2; \(n=51\) is the number of offshore observation points (Fig. 1b); \(k\) is the observation point number; \(i\) is the objective variable number. Because there are two variables (mean and standard deviation) for each cluster, the total length of \(i\) is 36 (18 clusters × 2 variables); \(j\) is the fault case number, whose total length is 3480; and a and b are the regression coefficients to be estimated.
The CG method (Fletcher and Reeves 1964) estimated the regression coefficients (a and b) using all the training data from 3480 cases. The CG method solves nonlinear problems by employing an iterative process. Starting from arbitrary initial values, observational equations (thus, Eq. (1)) perform the predictions and the prediction errors are evaluated. The initial values are then slightly changed in the direction of error reduction. The iteration process is repeated until the solution is sufficiently convergent (error is not reduced any further). Herein, we used the function optim of the statistical processing software R. The CG method included the intercept (c), and gave 0 to a and c, and 1 to b as the initial values for iteration.
Multilayer perceptron
The regression analysis described above is a type of machine learning. In recent years, however, many research fields have used more advanced machine learning techniques. The field of tsunami prediction is no exception (Fauzi and Mizutani 2020; Makinoshima et al. 2021). The MLP method (e.g., Gardner and Dorling 1998) is a standard machine learning method that uses a mathematical model that mimics the neuron network structure in the human brain. It consists of several layers, which include an input layer, intermediate multiple layers, and an output layer. A layer has multiple nodes, and a node has a value that is given by a superposition of all node values of the previous layer using the following equation:
where \({N}_{j}^{i}\) indicates the \(j\)th node on the \(i\)th layer and \(i\)= 0 for the input layer, \(i\) = 1∼\(m\) for the intermediate layers, and \(i\) = \(m\)+1 for the output layer; \(m\) is the number of intermediate layers; \(f\) is the activation function, but which is not applied for the output layer (\(i\) = \(m\)+1); \(k\) is the node number of the previous layer (i.e., at i − 1). The number of nodes can differ among the layers. \({\varvec{W}}\) and c are the weight and bias, respectively, which are optimized to construct the prediction model. Input values are given to nodes in the input layer, \({\varvec{W}}\) and c are initialized at random, prediction is performed using Eq. (2), and \({{\varvec{N}}}^{m+1}\) is compared with the true value to obtain the prediction error. Using an error backpropagation method, \({\varvec{W}}\) and c are updated in the direction that reduces the prediction error. The prediction is performed again with the new \({\varvec{W}}\) and c, the error is reevaluated, and \({\varvec{W}}\) and c are further updated using the error backpropagation method. This procedure repeats to find the optimal \({\varvec{W}}\) and c.
In addition to the power law regression in Eq. (1) solved using the CG method, this study predicted the tsunami flow depth for the clustered area using the MLP method with \({\varvec{y}}\) as the output layer (\({{\varvec{N}}}^{m+1}\)) and \({\varvec{x}}\) as the input layer (\({{\varvec{N}}}^{0}\)). We used Tensorflow libraries (Abadi et al. 2016) with the ReLU activation function. We repeated preliminary experiments of the MLP with different numbers of intermediate layers and nodes to evaluate the prediction error. The larger the number of layers and nodes, the lower the prediction error. We used the numbers of layers and nodes with which the prediction error did not decrease with further increases in the numbers. However, this method could lead to overfitting. For avoiding this, we used the Adam algorithm (Kingma and Ba 2015) to minimize the loss function with an L2 regularization term of the mean square error. A crossvalidation method was used to determine the hyperparameter value of the regularization term. Finally, this study used 9 intermediate layers (\(m\)). The first intermediate layer (\(i\) = 1) had eight nodes (2^{3}), and the number of nodes in the following intermediate layers was set by successively increasing the exponent by one.
Prediction of test dataset
Here, we demonstrate the performance of tsunami flow depth prediction using the test data of the 11 M9 scenarios (Cabinet Office 2012), which are different Nankai Trough earthquake models from the training of 3480 cases. The absolute values of the seafloor water pressure deviations at 51 DONET stations from the M9 scenarios represent explanatory variables that were substituted into the prediction equations from the CG and MLP methods. Figure 3 shows scatter plots between the predictions using the two methods and the true values (i.e., forward calculation results from the 11 M9 scenarios). The MLP method (coefficient of determination, R^{2} = 0.977) was more accurate than the CG method (R^{2} = 0.938) in predicting the average flow depth. The R^{2} of the prediction of the standard deviation of the clustered flow depth data were calculated to be 0.880 and 0.958 for the CG and MLP methods, respectively.
Figure 4 shows more quantitative comparisons of the prediction error using the root mean square of residual errors (RMSE) between the predicted and true value for each scenario. The CG method predicted the tsunamis based on the M9 scenarios with RMSE of between 0.39 and 1.75 m, while the MLP method predicted them with RMSE of between 0.34 and 1.08 m. The M9 scenarios 4 and 5 had lower prediction accuracy than the other scenarios. Figure 2 shows the normal distribution curves obtained by using the predicted results overlain on the true frequency percentages for the M9 scenario 3. Figure 5 compares the tsunami flow depth distribution calculated forwardly with one predicted using the CG and MLP methods.
Discussion
The CG method generates a prediction model using a linear summation of power law basis functions (Eq. (1)). Because the MLP method does not need to restrict the form of the basis function, the prediction is more accurate than the CG method. However, both methods showed poor predictive capability for M9 scenarios 4 and 5 (Fig. 4), which contain large slip and superlarge slip patches off the coast of Shikoku (Additional file 1: Figure S1). This poor prediction may be related to insufficient training data because machine learning techniques generally require a large amount of training data. We increased the clustered areas from 18 to 30 to pseudoincrease the training data and applied the same procedure. However, the prediction accuracies of the CG and MLP methods for M9 scenarios 4 and 5 were not increased (Fig. 4).
The tsunami heights of the M9 scenarios of the test data were comparable to the largest tsunami of the 3480 cases of training data. Giant tsunamis possess strong nonlinearity and more complex propagation resulting in a greater difficulty in prediction. We may have needed training data with tsunamis much larger than those produced by M9 scenarios. Therefore, we reanalyzed the training data that included larger tsunami cases using the CG and MLP methods. The larger tsunami cases were generated by multiplying the slip amount by a factor of 1.5 in 268 relatively large cases among the 3480 earthquake cases. We repeated the same procedure as for predictions of the M9 scenarios, but the accuracy did not improve in this test (Fig. 4).
The pattern of the clustered areas may be inappropriate for the prediction because the clustered areas were generated using the training data, which were different from the test data of the M9 scenarios. Therefore, we performed a cluster analysis on tsunami flow depth distributions in the test data (Additional file 1: Figure S4). We reconstructed the predictive models of the CG and MLP methods using the new cluster areas and predicted the tsunamis of the test data. However, the prediction accuracy did not improve from that using the original cluster areas (Fig. 4). From this trial, the effect of cluster classification appears to be small, at least for the datasets herein.
However, our cluster classification of tsunami flow depths requires improvement. The clustering analysis should have used all 3480 cases of the training data. However, owing to the memory limitations of the analysis computer, we could only perform the cluster classification based on a limited number of training datasets (14 cases) selected at random. Although we confirmed that the results changed little by using different selected sets of 14 cases, there remains room for improvement. There is also no clear criterion to determine the number of clusters. In this study, the number of clusters was set based on the data variance within the clusters. This criterion should include both the data variance and an operational perspective. For example, it may be better to classify the clusters according to the administrative area classification, such as postal codes, to provide easytounderstand tsunami information.
Increasing the number of training datasets and changing the cluster classification did not improve the accuracy. One possibility to further improve the prediction accuracy is the use of different characteristics in addition to the tsunami amplitude. The tsunami arrival time contains information on the direction of tsunami propagation, i.e., the location of the tsunami source. By using the arrival time as an explanatory variable, we can include information on the tsunami source location, which may improve the prediction accuracy. Additionally, a scheme that learns the full pressure waveforms may be helpful for further development. These investigations will be the subject of subsequent research.
Conclusion
An earlier study (Yoshikawa et al. 2019) proposed a method to predict the maximum tsunami height at a coastal point using multiple regression of power law equations. Here, we extended the method to predict the tsunami flow depth distribution on land areas. The MLP method successfully predicted the tsunami flow depth distribution on land areas, and the RMSE between the predicted and true values of the average flow depth was estimated to be in the range of 0.34 to 1.08 m. The MLP method was more accurate than the CG method because the former does not need to define the form of the basis function.
Although further improvements are needed, the methods presented here can construct a light and robust prediction system that does not require fast computation or a large database. While largescale tsunami prediction systems are currently becoming mainstream, it is beneficial to have such a standalone prediction system to mitigate unforeseen circumstances during a great disaster. Furthermore, tsunami disasters are a global problem, but only a few countries have access to highspeed realtime computers for tsunami early warnings.
Availability of data and materials
We used the tsunami software JAGURS, provided in an online repository at http://dx.doi.org/10.5281/zenodo.3737816. Owing to the large size of the training and test data, we cannot upload them on online repositories. Interested readers can contact us directly to share the training and test data. The M9 scenarios were downloaded from the following website: https://www.geospatial.jp/gp_front.
Abbreviations
 CG:

Conjugate gradient
 MLP:

Multilayer perceptron
 DONET:

Dense oceanfloor network system for earthquakes and tsunamis
 Snet:

Seafloor observation network for earthquakes and tsunamis along the Japan Trench
 JMA:

Japan Meteorological Agency
 RMSE:

Root mean square of residual errors
References
Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M et al (2016) TensorFlow: a system for largescale machine learning. In: 12th USENIX symposium on operating systems design and implementation (OSDI 16), Savannah, GA, USA, 2–4 November 2016, pp 265–283. https://www.usenix.org/system/files/conference/osdi16/osdi16abadi.pdf
Ammon CJ, Lay T, Kanamori H, Cleveland M (2011) A rupture model of the great 2011 Tohoku earthquake. Earth Planet Space 63:693–696. https://doi.org/10.5047/eps.2011.05.015
Baba T, Takahashi N, Kaneda Y (2014) Nearfield tsunami amplification factors in the Kii Peninsula, Japan for dense oceanfloor network for earthquakes and tsunamis (DONET). Mar Geophys Res 35:319–325. https://doi.org/10.1007/s1100101391891
Baba T, Takahashi N, Kaneda Y, Ando K, Matsuoka D, Kato T (2015) Parallel implementation of dispersive tsunami wave modeling with a nesting algorithm for the 2011 Tohoku tsunami. Pure Appl Geophys 172:3455–3472. https://doi.org/10.1007/s0002401510492
Baba T, Ando K, Matsuoka D, Hyodo M, Hori T, Takahashi N, Obayashi R, Imato Y, Kitamura D, Uehara H, Kato T, Saka R (2016) Largescale, highspeed tsunami prediction for the great Nankai trough earthquake on the K computer. Inter Jour of High per Comp App 30:71–84. https://doi.org/10.1177/1094342015584090
Bird P (2003) An updated digital model of plate boundaries. Geochem Geophys Geosyst 4(3):1027. https://doi.org/10.1029/2001GC000252
Cabinet Office, Government of Japan (2012) Massive earthquake model review meeting of the Nankai trough. http://www.bousai.go.jp/jishin/nankai/model/index.html (in Japanese). Accessed 25 Jan 2022
Fauzi A, Mizutani N (2020) Machine learning algorithms for realtime tsunami inundation forecasting: a case study in Nankai region. Pure Appl Geophys 177:1437–1450. https://doi.org/10.1007/s00024019023644
Fire and Disaster Management Agency (2021) The 2011 off the Pacific coast of Tohoku Earthquake (Report 161). https://www.fdma.go.jp/disaster/higashinihon/items/161.pdf (in Japanese). Accessed 25 Jan 2022
Fletcher R, Reeves CM (1964) Function minimization by conjugate gradients. Computer J 7:148–154. https://doi.org/10.1093/comjnl/7.2.149
Fujiwara H, Hirata K, Nakamura H et al (2020) Probabilistic tsunami hazard assessment for earthquakes occurring along the Nankai trough—volume 1 part I–. Technical Note of the National Research Institute for Earth Science and Disaster Resilience. 439. https://dilopac.bosai.go.jp/publication/nied_tech_note/pdf/n439_01m_1.pdf
Gardner MW, Dorling SR (1998) Artificial neural networks (the multilayer perceptron)—A review of applications in the atmospheric sciences. Atmos Environ 32:2627–2636. https://doi.org/10.1016/S13522310(97)004470
Hartigan JA, Wong MA (1979) Algorithm AS 136: a Kmeans clustering algorithm. Appl Stat 28:100–108. https://doi.org/10.2307/2346830
Igarashi Y, Hori T, Murata S, Baba T, Okada M (2016) Maximum tsunami height prediction using pressure gauge data by a Gaussian process at Owase in the Kii Peninsula Japan. Mar Geophys Res 37:361–370. https://doi.org/10.1007/s110010169286z
Kajiura K (1963) The leading wave of a tsunami. Bull Earthquake Res Inst. 41:535–571. https://repository.dl.itc.utokyo.ac.jp/record/33711/files/ji0413004.pdf
Kaneda Y, Takanashi N, Baba T, Kawaguchi K, Araki E, Matsumoto H, Nakamura T, Kamiya S, Ariyoshi K, Hori T, Hyodo M, Nakano M (2015) Advanced real time monitoring system and simulation researches for earthquakes and tsunamis in Japan. Adv Nat Tech Haz Res 44:179–189. https://doi.org/10.1007/9783319102023_12
Kingma DP, Ba JL (2015) Adam: a method for stochastic gradient descent. The 3rd International Conference Learn Represent. p 10, San Diego, 27 Feb 2015. https://doi.org/10.48550/arXiv.1412.6980
Koshimura S (2017) Fusion of realtime disaster simulation and big data assimilation—recent progress. J Disas Res 12:226–232. https://doi.org/10.20965/jdr.2017.p0226
Maeda T, Obara K, Shinohara M, Kanazawa T, Uehira K (2015) Successive estimation of a tsunami wavefield without earthquake source data: a data assimilation approach toward realtime tsunami forecasting. Geophys Res Lett 42:7923–7932. https://doi.org/10.1002/2015GL065588
Makinoshima F, Oishi Y, Yamazaki T, Furumura T, Imamura F (2021) Early forecasting of tsunami inundation from tsunami and geodetic observation data with convolutional neural networks. Nat Commun 12:2253. https://doi.org/10.1038/s41467021223480
Mochizuki M, Uehira K, Kanazawa T, Kunugi T, Shiomi K, Aoi S, Matsumoto T, Takahashi N, Chikasada N, Nakamura T, Sekiguchi S, Shinohara M (2018) Snet project: performance of a largescale seafloor observation network for preventing and reducing seismic and tsunami disasters. 2018 OCEANS MTS/IEEE Kobe TechnoOceans, OCEANS Kobe 2018. https://doi.org/10.1109/OCEANSKOBE.2018.8558823
Mori N, Takahashi T (2012) Nationwide survey of the 2011 Tohoku earthquake tsunami. Coast Eng J 54:1–27. https://doi.org/10.1142/S0578563412500015
Musa A, Matsuoka H, Watanabe O, Murashima Y, Koshimura S, Hino R, Ohta Y, Kobayashi H (2015) A realtime tsunami inundation forecast system for tsunami disaster prevention and mitigation. The international conference for high performance computing, networking, storage and analysis (SC15), Austin, Texas. 15–20
Okada Y (1985) Surface deformation due to shear and tensile faults in a halfspace. Bull Seism Soc Am 75:1435–1154. https://doi.org/10.1785/BSSA0750041135
Satake K, Fujii Y, Harada T, Namegaya Y (2013) Time and space distribution of coseismic slip of the 2011 Tohoku earthquake as inferred from tsunami waveform data. Bull Seismol Soc Am 103:1473–1492. https://doi.org/10.1785/0120120122
Simons M, Minson SE, Sladen A, Ortega F, Jiang J, Owen SE, Meng L, Ampurero JP, Wei S, Chu R, Helmberger DV, Kanamori H (2011) The 2011 magnitude 9.0 TohokuOki earthquake: mosaicking the megathrust from seconds to centuries. Science 332:1421–1425. https://doi.org/10.1126/science.1206731
Tanioka Y, Satake K (1996) Tsunami generation by horizontal displacement of ocean bottom. Geophys Res Lett 23:861–864. https://doi.org/10.1029/96GL00736
Tsushima H, Hino R, Fujimoto H, Tanioka Y, Imamura F (2009) Nearfield tsunami forecasting from cabled ocean bottom pressure data. J Geophy Res 114:B06309. https://doi.org/10.1029/2008JB005988
Tsushima H, Hino R, Tanioka Y, Imamura F, Fujimoto H (2012) Tsunami waveform inversion incorporating permanent seafloor deformation and its application to tsunami forecasting. J Geophys Res 117:B03311. https://doi.org/10.1029/2011JB008877
Wang Y, Satake K, Maeda T, Gusman AR (2017) Green’s function based tsunami data assimilation: a fast data assimilation approach toward tsunami early warning. Geophys Res Lett 44:10282–10289. https://doi.org/10.1002/2017GL075307
Wang Y, Satake K, Maeda T, Gusman AR (2018) Data assimilation with dispersive tsunami model: a test for the Nankai trough. Earth Planet Space 70:131. https://doi.org/10.1186/s4062301809056
Wessel P, Smith WHF, Scharroo R, Luis J, Wobbe F (2013) Generic mapping tools: improved version released. EOS Trans Am Geophys Union 94(45):409–410. https://doi.org/10.1002/2013EO450001
Yamamoto N, Aoi S, Hirata K, Suzuki W, Kunugi T, Nakamura H (2016) Multiindex method using offshore oceanbottom pressure data for realtime tsunami forecast. Earth Planet Space 68:128. https://doi.org/10.1186/s4062301605007
Yoshikawa M, Igarashi Y, Murata M, Baba T, Hori T, Okada M (2019) A nonlinear parametric model based on a power law relationship for predicting the coastal tsunami height. Mar Geophys Res 40:467–477. https://doi.org/10.1007/s11001019093884
Acknowledgements
We thank two anonymous reviewers whose constructive comments improved the manuscript. We thank the associate editor, Dr. Tatsuhiko Saito for editing. We thank Dr. H. Fujiwara for providing us with the 3480 fault models for the Nankai earthquakes. We thank Tokushima Prefecture for providing us with the dataset for the tsunami calculations. We used R statistical software and Python for data analysis and plotting. We also used GMT (Wessel et al. 2013) for data plotting. We conducted tsunami simulations on the Earth Simulator at the Japan Agency for MarineEarth and Technology.
Funding
This work was supported by JSPS KAKENHI (Grant Number JP19H02409, JP22H01742).
Author information
Authors and Affiliations
Contributions
TB designed the study. MK performed tsunami calculations and regression analysis. All authors interpreted the results of the analyses. TB and MK wrote the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Additional file 1
: Figure S1. Fault slip distributions of the eleven M9 earthquake scenarios. Figure S2. Tsunami flow depth distributions calculated from the eleven M9 earthquake scenarios in the tsunami prediction area. These data were used as test data. Terrestrial contours indicate elevation, with an interval of 50 m. Figure S3. Schematic showing ocean bottom pressure changes owing to crustal displacements and tsunamis. Figure S4. Clustered areas obtained by the kmeans method for the tsunami flow depth predictions using (a) 14 models selected from the training dataset and (b) eleven M9 earthquake scenarios. Terrestrial contours indicate elevation, with an interval of 50 m.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Kamiya, M., Igarashi, Y., Okada, M. et al. Numerical experiments on tsunami flow depth prediction for clustered areas using regression and machine learning models. Earth Planets Space 74, 127 (2022). https://doi.org/10.1186/s40623022016809
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s40623022016809
Keywords
 Tsunami prediction
 Regression
 Power law
 Multilayer perceptron