### Training and test datasets

Fujiwara et al. (2020) proposed fault models with 3480 cases of interplate earthquakes in the study area (Fig. 1). We constructed the training data by calculating all the tsunamis generated by these 3480 fault models. Fault motion in each model caused seafloor displacement assuming a semi-infinite homogeneous elastic body (Okada 1985). We estimated the initial tsunami water level using the vertical component of the seafloor displacement, the effect of tsunami excitation by horizontal displacement of the seafloor slope (Tanioka and Satake 1996), and the filter of the linear potential theory (Kajiura 1963). Numerical tsunami simulations used the rise time of the initial water level at 60 s. The nonlinear long-wave equations solved by the staggered grid leapfrog difference method (Baba et al. 2015; 2016) estimated the tsunami propagation from the initial water level and tsunami run-up on land. The topographic data used in the tsunami calculations were obtained from the local government in the study area, i.e., Tokushima Prefecture. Topographic nesting consisted of five layers (Fig. 1b). The grid intervals in the layers were 810, 270, 90, 30, and 10 m from the coarsest layer to the finest layer in the study area (Fig. 1c). The tide level when the tsunami occurred was assumed to be the mean tide level (0 m in Tokyo Peil). Coastal tsunami defense structures smaller than the grid intervals, such as breakwaters, were modeled as line structures in the calculations. When a tsunami overtopped the coastal structures, we considered them to be collapsed structures and continued the calculation by excluding the line structures. The integral time was 6 h, so that the maximum tsunami waves arrived in all the evaluation areas. To satisfy the stability condition of the computation, the computational time step width was set to 0.1 s. Owing to the large computational load, we used a supercomputer (Earth Simulator, ES3) to perform the tsunami calculations. Each tsunami calculation took approximately 11 h using four ES3 nodes.

In 2012, the Cabinet Office of Japan re-investigated geological and geophysical features and historical interplate earthquakes in the Nankai Trough to construct earthquake scenarios (fault slip distributions), which may occur in the Nankai Trough. The basic procedure for constructing the earthquake scenarios was as follows. First, the fault plane was divided into a tsunami fault zone shallower than a depth of 10 km, and a main fault zone deeper than 10 km. The seismic moment of the main fault zone was calculated using a scaling law from the average amount of stress drop and the area of the main fault zone. Many sub-faults were defined (of approximately 5 × 5 km) on the fault plane, whose slip amount was calculated using the seismic moment and the difference in the plate convergence rate. The slip angle on the sub-faults was assumed to be in the opposite direction to the plate convergence angle; thus, mainly thrust motions. In addition, large slip and super-large slip patches were introduced to account for earthquake slip heterogeneity. The large-slip patch had a slip amount twice the average slip, accounted for 20% of the total area of the fault, and was located somewhere in the shallow half of the fault plane. The super-large patch had a slip four times larger than the average slip and was located in the tsunami fault zone, along the trench axis neighboring the large slip patch. Eleven earthquake scenarios were created with a magnitude of 9.1 (hereafter referred to as M9 scenarios) by changing the location of the large slip and the super-large slip patches (Additional file 1: Figure S1). This study calculated tsunamis from the M9 scenarios outlined above and used them as test data (Additional file 1: Figure S2).

### Cluster analysis for tsunami flow depth

To reduce the number of prediction points, we created clustered areas where the tsunami flow depth was always similar among the 3480 fault cases. Fourteen fault models that resulted in significant tsunami inundation in the study area were randomly selected from the training data, and the k-means method (Hartigan and Wong 1979) was applied to their flow depth distributions. Of the 3480 cases, the selected fault model identification numbers were 101, 315, 884, 1562, 1596, 1816, 1838, 2125, 2512, 2645, 2668, 2725, 2842, and 2850 in Fujiwara et al. (2020). Ideally, it would be better to conduct the clustering analysis using all 3480 fault cases, but this was not possible because of the amount of memory required for clustering analysis using a large number of points. We repeated the clustering analysis several times by changing the selected models and confirmed that the obtained cluster patterns were similar.

In the k-means method, the analyst should specify the number of clustered areas in advance. This study determined the number of clustered areas based on the variance in flow depth in the clustered area. Additionally, a small number of clustered areas is preferable because the purpose of this study is to predict tsunami flow depths with low computational cost. We repeated the cluster analysis by changing the number of clustered areas generated and evaluated the standard deviation of the tsunami flow depth data in each clustered area. For convenience, we assumed a criterion that standard deviation of the flow depth data in each cluster must be less than 0.5 m with as small a number of clusters as possible. This criterion led to 18 clustered areas being the optimal case.

The obtained cluster classification showed good correlation with the topography (Fig. 1c). The clustered areas 12 and 13 appear around the river in the northern part of the map, and the clustered area varies following elevation in the southern part of the map. Some clustered areas are not spatially continuous, but discrete, because this study aims to reduce the number of predicted points by using a small number of clustered areas. Figure 2 shows the frequency percentage in the flow depth data for each clustered area for M9 scenario No. 3 (Additional file 1: Figure S2) as an example.

### Regression models using a power law

Green's law, which states \(H{h}^{1/4}=const\), where \(H\) is the wave height and \(h\) is the water depth, can express the amplification of tsunami wave height under the linear tsunami theory. Hence, a linear multiple regression seemed reasonable for predicting coastal tsunami heights using data from multiple offshore tsunami stations. However, tsunamis contain strong nonlinear effects from advection terms and bottom friction near the coast, and these nonlinear effects are more likely to appear in large tsunamis. Yoshikawa et al. (2019) pointed out that multiple regressions with a power law are more suitable than linear multiple regressions. Hence, we also used multiple regressions with a power law for our tsunami prediction model. It should be noted that Green’s law correlates the tsunami height between coastal and offshore points. However, ocean bottom pressure gauges do not simply observe the tsunami heights for earthquakes at the gauges because the seafloor, seawater (tsunami generation), and the gauges move simultaneously owing to crustal displacement (Tsushima et al. 2012; Baba et al. 2014; Additional file 1: Figure S3). Hence, the pressure gauges observe the water pressure fluctuation (crustal movement + sea surface displacement). The water pressure remains almost unchanged during tsunami excitation under the assumption of hydrostatic pressure. The water pressure decreases with tsunami propagation, and the water pressure corresponding to the vertical component of the crustal movement decreases after a tsunami subsides. Therefore, this study used the absolute value of the maximum deviation of seafloor water pressure during a tsunami as the explanatory variable (\({\varvec{x}}\)). The tsunami prediction equation used in this study is as follows:

$$y_{i,j} = \sum\limits_{k = 1}^{n} {a_{i,k} x_{j,k}^{{b_{i,k} }} } ,$$

(1)

where *y* is the objective variable, which is the mean and standard deviation of the tsunami flow depths in the clustered areas. In other words, we predicted the shape of the frequency percentage of the tsunami flow depths in the clusters shown in Fig. 2; \(n=51\) is the number of offshore observation points (Fig. 1b); \(k\) is the observation point number; \(i\) is the objective variable number. Because there are two variables (mean and standard deviation) for each cluster, the total length of \(i\) is 36 (18 clusters × 2 variables); \(j\) is the fault case number, whose total length is 3480; and *a* and *b* are the regression coefficients to be estimated.

The CG method (Fletcher and Reeves 1964) estimated the regression coefficients (*a* and *b*) using all the training data from 3480 cases. The CG method solves nonlinear problems by employing an iterative process. Starting from arbitrary initial values, observational equations (thus, Eq. (1)) perform the predictions and the prediction errors are evaluated. The initial values are then slightly changed in the direction of error reduction. The iteration process is repeated until the solution is sufficiently convergent (error is not reduced any further). Herein, we used the function *optim* of the statistical processing software *R*. The CG method included the intercept (c), and gave 0 to *a* and c, and 1 to *b* as the initial values for iteration.

### Multilayer perceptron

The regression analysis described above is a type of machine learning. In recent years, however, many research fields have used more advanced machine learning techniques. The field of tsunami prediction is no exception (Fauzi and Mizutani 2020; Makinoshima et al. 2021). The MLP method (e.g., Gardner and Dorling 1998) is a standard machine learning method that uses a mathematical model that mimics the neuron network structure in the human brain. It consists of several layers, which include an input layer, intermediate multiple layers, and an output layer. A layer has multiple nodes, and a node has a value that is given by a superposition of all node values of the previous layer using the following equation:

$$N_{j}^{i} = \left\{ \begin{gathered} f\left( {\sum\limits_{k} {W_{j,k}^{i} N_{k}^{i - 1} + c_{j}^{i} } } \right)\quad \quad \left( {1 \le i \le m} \right) \hfill \\ \sum\limits_{k} {W_{j,k}^{i} N_{k}^{i - 1} + c_{j}^{i} \quad \quad \left( {i = m + 1} \right)} \hfill \\ \end{gathered} \right.,$$

(2)

where \({N}_{j}^{i}\) indicates the \(j\)th node on the \(i\)th layer and \(i\)= 0 for the input layer, \(i\) = 1∼\(m\) for the intermediate layers, and \(i\) = \(m\)+1 for the output layer; \(m\) is the number of intermediate layers; \(f\) is the activation function, but which is not applied for the output layer (\(i\) = \(m\)+1); \(k\) is the node number of the previous layer (i.e., at *i* − 1). The number of nodes can differ among the layers. \({\varvec{W}}\) and *c* are the weight and bias, respectively, which are optimized to construct the prediction model. Input values are given to nodes in the input layer, \({\varvec{W}}\) and *c* are initialized at random, prediction is performed using Eq. (2), and \({{\varvec{N}}}^{m+1}\) is compared with the true value to obtain the prediction error. Using an error backpropagation method, \({\varvec{W}}\) and *c* are updated in the direction that reduces the prediction error. The prediction is performed again with the new \({\varvec{W}}\) and *c*, the error is re-evaluated, and \({\varvec{W}}\) and *c* are further updated using the error backpropagation method. This procedure repeats to find the optimal \({\varvec{W}}\) and *c*.

In addition to the power law regression in Eq. (1) solved using the CG method, this study predicted the tsunami flow depth for the clustered area using the MLP method with \({\varvec{y}}\) as the output layer (\({{\varvec{N}}}^{m+1}\)) and \({\varvec{x}}\) as the input layer (\({{\varvec{N}}}^{0}\)). We used *Tensorflow* libraries (Abadi et al. 2016) with the ReLU activation function. We repeated preliminary experiments of the MLP with different numbers of intermediate layers and nodes to evaluate the prediction error. The larger the number of layers and nodes, the lower the prediction error. We used the numbers of layers and nodes with which the prediction error did not decrease with further increases in the numbers. However, this method could lead to overfitting. For avoiding this, we used the Adam algorithm (Kingma and Ba 2015) to minimize the loss function with an L2 regularization term of the mean square error. A cross-validation method was used to determine the hyperparameter value of the regularization term. Finally, this study used 9 intermediate layers (\(m\)). The first intermediate layer (\(i\) = 1) had eight nodes (2^{3}), and the number of nodes in the following intermediate layers was set by successively increasing the exponent by one.

### Prediction of test dataset

Here, we demonstrate the performance of tsunami flow depth prediction using the test data of the 11 M9 scenarios (Cabinet Office 2012), which are different Nankai Trough earthquake models from the training of 3480 cases. The absolute values of the seafloor water pressure deviations at 51 DONET stations from the M9 scenarios represent explanatory variables that were substituted into the prediction equations from the CG and MLP methods. Figure 3 shows scatter plots between the predictions using the two methods and the true values (i.e., forward calculation results from the 11 M9 scenarios). The MLP method (coefficient of determination, *R*^{2} = 0.977) was more accurate than the CG method (*R*^{2} = 0.938) in predicting the average flow depth. The *R*^{2} of the prediction of the standard deviation of the clustered flow depth data were calculated to be 0.880 and 0.958 for the CG and MLP methods, respectively.

Figure 4 shows more quantitative comparisons of the prediction error using the root mean square of residual errors (RMSE) between the predicted and true value for each scenario. The CG method predicted the tsunamis based on the M9 scenarios with RMSE of between 0.39 and 1.75 m, while the MLP method predicted them with RMSE of between 0.34 and 1.08 m. The M9 scenarios 4 and 5 had lower prediction accuracy than the other scenarios. Figure 2 shows the normal distribution curves obtained by using the predicted results overlain on the true frequency percentages for the M9 scenario 3. Figure 5 compares the tsunami flow depth distribution calculated forwardly with one predicted using the CG and MLP methods.