Rapid and quantitative uncertainty estimation of coseismic slip distribution for large interplate earthquakes using real-time GNSS data and its application to tsunami inundation prediction

This study proposes a new method for the uncertainty estimation of coseismic slip distribution on the plate interface deduced from real-time global navigation satellite system (GNSS) data and explores its application for tsunami inundation prediction. Jointly developed by the Geospatial Information Authority of Japan and Tohoku University, REGARD (REal-time GEONET Analysis system for Rapid Deformation monitoring) estimates coseismic fault models (a single rectangular fault model and slip distribution model) in real time to support tsunami prediction. The estimated results are adopted as part of the Disaster Information System, which is used by the Cabinet Office of the Government of Japan to assess tsunami inundation and damage. However, the REGARD system currently struggles to estimate the quantitative uncertainty of the estimated result, although the obtained result should contain both observation and modeling errors caused by the model settings. Understanding such quantitative uncertainties based on the input data is essential for utilizing this resource for disaster response. We developed an algorithm that estimates the coseismic slip distribution and its uncertainties using Markov chain Monte Carlo methods. We focused on the Nankai Trough of southwest Japan, where megathrust earthquakes have repeatedly occurred, and used simulation data to assume a Hoei-type earthquake. We divided the 2951 rectangular subfaults on the plate interface and designed a multistage sampling flow with stepwise perturbation groups. As a result, we successfully estimated the slip distribution and its uncertainty at the 95% confidence interval of the posterior probability density function. Furthermore, we developed a new visualization procedure that shows the risk of tsunami inundation and the probability on a map. Under the algorithm, we regarded the Markov chain Monte Carlo samples as individual fault models and clustered them using the k-means approach to obtain different tsunami source scenarios. We then calculated the parallel tsunami inundations and integrated the results on the map. This map, which expresses the uncertainties of tsunami inundation caused by uncertainties in the coseismic fault estimation, offers quantitative and real time insights into possible worst-case scenarios.


Introduction
A rapid understanding of the magnitude and fault areas of large earthquakes is crucial for disaster response. Numerous studies have shown the advantages of the high-rate (typically 1 Hz) global navigation satellite system (GNSS) as a broadband sensor that can directly measure displacement without saturation (e.g., Larson et al. 2003;Ohta et al. 2006;Larson 2009). The use of onshore high-rate GNSS data enable the rapid estimation of offshore finite fault models that are expected to be used for tsunami prediction (e.g., Blewitt et al. 2006Blewitt et al. 2009Ohta et al. 2012;Melgar et al. 2012;Ohta 2016). The Geospatial Information Authority of Japan (GSI) operates a nationwide continuous GNSS network called GEONET. After the massive tsunami caused by the 2011 Tohoku-Oki earthquake, the GSI and Tohoku University jointly developed the REGARD system (REal-time GEONET Analysis system for Rapid Deformation monitoring, Kawamoto et al. 2016Kawamoto et al. , 2017, which consists of real-time GNSS positioning, automatic detection of coseismic displacement, and quasi-real-time finite fault model inversion routines. The finite fault models estimated by REGARD are helpful for obtaining the initial sea surface distribution for tsunami forecasting. The consortium led by Tohoku University has developed a real-time damage estimation system for tsunami inundation using the REGARD fault model as an initial tsunami source model (Musa et al. 2018;Ohta et al. 2018). This system is expected to be used for the initial response of the government when a disaster occurs.
For the 2016 Kumamoto earthquake (M w 7.0), REGARD successfully estimated a single rectangular fault model automatically in real time ). However, estimation using only onshore GNSS data does not necessarily provide accurate estimates of offshore fault areas. From this point of view, in the single rectangular fault model, Ohno et al. (2021) developed a new method to quantitatively evaluate the uncertainties of the estimated results using the Bayesian inversion approach. This study showed a tradeoff between the fault area and slip amount, especially for offshore earthquakes, which affects the initial wave field of the tsunami. In the slip distribution model of REGARD, quantitative evaluation of the uncertainty is also essential because of the effects of station arrangement, modeling errors, and constraints on slip smoothing. For example, in the case of the 2011 Tohoku-Oki earthquake, it was suggested that there was a large slip around the trench axis based on seafloor geodetic data (e.g., Iinuma et al. 2012), although such slip was not significantly suggested using only onshore GNSS data (e.g., Melgar and Bock 2015).

Graphical Abstract
The quantification of the uncertainty of fault models enables us to consider the accuracy of tsunami predictions calculated based on them. With the underestimation of tsunami wave height in the 2011 Tohoku-Oki earthquake as a lesson, the danger of announcing tsunami predictions as a single value has been proposed. The Japan Meteorological Agency (JMA) introduced the announcement of tsunami warnings with more qualitative expressions (JMA 2012). In addition, several studies have shown a method for adding uncertainty information to tsunami prediction (e.g., Sraj et al. 2014;Fukutani et al. 2015;Takagawa and Tomita 2015;Dettmer et al. 2016;Goda and Song 2016;Gibbons et al. 2020;Goda et al. 2020;Mulia et al. 2020;Giles et al. 2021). Dettmer et al. (2016) evaluated the uncertainty of the initial sea surface displacement in the 2011 Tohoku-Oki earthquake deduced from tsunami waveforms using a trans-dimensional algorithm based on wavelet decomposition of the displacement field. Goda et al. (2020) presented an extensive tsunami hazard assessment for Nankai-Tonankai trough events using 1000 kinematic earthquake rupture models for Monte Carlo tsunami simulations. Giles et al. (2021) proposed a workflow that integrates the entire chain of components from the tsunami source to quantify hazard uncertainties by approximating the functionally complex and computationally expensive high-resolution tsunami simulations with a simple and cheap statistical emulator. On the other hand, in the damage estimation system due to tsunami inundation, it is currently difficult to evaluate the impact of the uncertainty of fault models on tsunami prediction.
With this background, the purpose of this study was to add uncertainty information to tsunami inundation estimation by quantifying the uncertainty of the coseismic slip distribution model estimated from onshore real-time GNSS data. We propose a "stepwise partitioning algorithm, " for estimating coseismic slip distribution using Markov chain Monte Carlo methods (MCMC). Furthermore, we propose a "real-time tsunami inundation risk map, " which visualizes the probability of risk of tsunami inundation based on the uncertainty of the estimated fault model. We verified the performance of this approach by applying it to simulated data in the Nankai Trough area.  Okada (1992). The width of rectangular subfaults is approximately 8 km Ohno et al. Earth, Planets and Space (2022) 74:24 Methods

Stepwise partitioning algorithm
Conventionally, the slip distribution model is estimated with a hyper-parameter governing the intensity of slip smoothing constraints determined with an index, such as ABIC (Akaike's Bayesian Information Criterion, Akaike 1980;e.g., Fukahata 2009). The error matrix obtained in this case is essential for interpreting the uncertainty of slip amount at each subfault. These methods are highly relevant in terms of obtaining a single most likely solution and its error. In other words, with these approaches, it is difficult to obtain multiple models that can explain the observed data, and the discussion of the errors remains a challenge. It is important to present several possible models with errors, while satisfying the observed data, to treat the obtained results as disaster information. Recently, many studies have adopted the Bayesian approach to geodetic studies because of its advantages in quantifying uncertainty Dettmer et al. 2014;Minson et al. 2014aMinson et al. 2014bJiang and Simons 2016;Ohno and Ohta 2018;Ohno et al. 2021). Owing to its sampling process, MCMC can generate a large number of model samples and quantify the uncertainty as probability density function (PDF), which is the ensemble of those samples. In this study, we regarded the diversity of samples as individual slip distribution models that can explain the observation data to evaluate the uncertainties of tsunami inundation caused by uncertainties in the estimation of coseismic slip distribution. Furthermore, to get the diversity of samples efficiently in real-time, essentially similar to smoothing constraint, but we tried a different approach to regularization. We adopted simulated data based on the assumed Mw 8.75 1707 Hoei earthquake (Furumura et al. 2011;Todoriki et al. 2013;Inoue et al. 2016;Kawamoto et al. 2017). The assumed slip reached from Suruga Bay to the westernmost part of Shikoku (Fig. 1). Let θ be a model parameter vector that contains slip amounts on subfaults along the plate boundary in the Nankai Trough, and d be a permanent displacement data vector with three components (horizontal and vertical components) based on real-time GNSS observations. We used the same method as Ohno et al. (2021) for MCMC sampling. Bayesian estimation of the unknown model parameters conditional on the observations is based on Bayes' theorem: where p(d) is a PDF of the observations, p(θ ) is a prior PDF of the model parameters, p(d|θ) is a likelihood function, and p(θ |d) is a posterior PDF of the model parameters. Note that p(d) is constant, because observations are fixed values and, hence, the posterior PDF is proportional to the product of the prior PDF and likelihood (1) p(θ |d) = p(d|θ )p(θ ) p(d)

Fig. 2 "
Stepwise partitioning algorithm" overview, depicting transition from stage n to stage n + 1 ( n = 1, 2, 3 ). Colors indicate the perturbation groups of θ . Perturbation groups that differ from stage to stage enable search from the outline to the details without smoothing constraints. The number of perturbations θ max that differ for each perturbation group enables a flexible parameter search with large estimation uncertainty function. The likelihood function measures the degree of fit between the observed data d, and calculated data d(θ) . The residuals are given by r(θ) = d(θ) − d . When the number of observation stations is N , the dimension of dis3N . Assuming that the estimation error follows a normal distribution function, the likelihood function is defined as follows: where σ ij (i = E(East), N (North), U (UpDown)) (which is identical at every GNSS station) is an event-dependent hyper parameter that includes modeling and observation errors. This study assumed the hyper parameters for each horizontal and vertical components as follows: where e ij is the steady noise of real-time GNSS observations. We used 0.1d ij as the proxy for the modeling and observation errors and assumed σ ij on each component at each station is whichever of those two error sources is larger. By assuming that σ ij depends on the displacement while keeping the real-time GNSS steady noise as a lower bound, we can prevent the model from overfitting the data. We used the Metropolis-Hasting method as the MCMC sampler (hereafter, M-H method; Metropolis et al. 1953;Hastings 1970). We adopted parallel tempering with eight parallel chains to improve the search efficiency (Geyer 1991;Jasra et al. 2007). For simplicity of the real-time analysis, we fixed rake angles to 90°; as such, θ contains only slip amounts, the element number of which is the same as that of the subfaults. In addition, we assumed the slip amounts to be non-negative, which is equivalent to assuming that the prior PDF formed a uniform distribution U(0, ∞) for θ . Incorporating nonnegative slip constraint into the inversion as a search range setting is one of the advantages of the MCMC sampling process over conventional methods. There were 2951 subfaults on the plate boundary. We adopted the analytical solution of Okada (1992) as the Green's function for rectangular subfaults, where the fault width are approximately 8 km.
In the M-H method, we propose new transition candidates by applying random perturbations θ with uniform distribution between − θ max and + θ max . However, applying MCMC to such a problem with many unknown parameters takes a long time to converge. Therefore, in addition to parallel tempering, we improved the search efficiency using a multistage approach with stepwise partitioning of perturbation groups. Figure 2 (2) shows an overview of the "stepwise partitioning algorithm". The Markov chain was divided into four stages (number of steps in each chain): stage 1 ( 3 × 10 6 steps); stage 2 ( 3 × 10 6 steps); stage 3 ( 3 × 10 6 steps); and stage 4 ( 3 × 10 6 steps). In each stage, we set spatial perturbation groups in advance, because grouping θ around subfaults promoted convergence. Thus, we applied the same perturbation θ to the same-colored subfaults, as shown in Fig. 2; we refer to these pre-determined areas with the same θ as "perturbation groups". The number of perturbation groups are 80, 185, 388, and 1451 for stage 1, 2, 3, and 4, respectively. Increasing the number of perturbation groups as the stage progresses allows us to estimate the rough-to-detailed features. The utilization of such a grouping is equivalent to reducing the number of unknown parameters. It should be noted that we did not change the shapes of the 2951 rectangular subfaults; that is, only the perturbation groups were changed. Thus, the analytical displacements in each step were calculated for 2951 background subfaults using a constant each subfault's Green's function.
As shown in Fig. 2, in the transition between stages, we used the median of the posterior PDFs of the previous stage as the initial value for the next stage and used the 95% confidence interval (CI) as the amount of perturbation θ max for the next stage. By inheriting the uncertainty as the transition amount, we searched for a broader slip amount with more uncertainty in a limited number of samples. Therefore, in the initial 10% of samples for each stage, we adjusted θ max by equal multiplication so that the acceptance rate would be 20-40% (e.g., Roberts and Rosenthal 2001), and the samples under adjustment were discarded during PDF generation as burn-in. In addition to the model parameters, we also calculated Variance Reduction (VR) for evaluation of the estimated model: In the later stage, the partitioning may be excessive for the sensitivity of the data and earthquake magnitude. We calculated the AIC (Akaike's Information Criterion, Akaike 1973) using the following equation to determine the optimum stage: where L indicates the maximum likelihood, and M indicates the number of unknown parameters, which is the number of perturbation groups in each stage.
The approach presented above is similar to the transdimensional approach (e.g., Dettmer et al. 2014), but our algorithm was designed to force change in one direction to simplify the problem for real-time purposes.
Moreover, our procedure is predefined by a set of perturbation groups with different spatial resolutions. Then, an optimum stage is selected based on AIC corresponding to a regularization process similar to smoothing constraints with discrete search (i.e., 4 stages) of regularization parameters. We designed this approach to automatically generate various models with the appropriate spatial resolution for the magnitude of the target earthquake without smoothing constraints between adjacent subfaults. The subsequent tsunami calculation process ("Real-time (ii) Classify them into small clusters (K) by the k-means approach using θ as the feature value. (iii) Generate K representative slip distributions (using the median value in each subfault), and using them as inputs, calculate K individual tsunami inundations. (iv) Count the number of inundations on the map tsunami inundation risk map" section) utilizes samples from one selected stage rather than multiple stages. In each stage with the constant spatial resolution, we can benefit from sampling without smoothing constraints as a diversity of multiple samples.
The computation time of this approach is within 30 min for each stage ( 3 × 10 6 steps in each chain) using an SX-Aurora TSUBASA Type20B processor with eight parallel chains in the case of 642 GNSS stations. In this study, we decided the number of samples with the highest priority to obtain a sufficient number to evaluate the differences between the stages. For real-time utilization, the convergence judge (e.g., Gelman 1996) should be introduced and must move to the next stage automatically. According to the estimated results of this paper, the number of chains in each stage seems to converge at about 10% of the number of advance settings. Based on this, if we could design convergence judges and AIC staging decisions properly, it is possible to complete the calculation up to stage 2 in approximately 6 min, which is a computation time that may be used in real time in the future.

Real-time tsunami inundation risk map
In general, because the tsunami inundation calculation is a nonlinear problem, it is difficult to evaluate the uncertainty of the tsunami inundation based on a single coseismic fault slip model and variation index, such as standard deviation of slip amounts. Therefore, to evaluate this uncertainty, we need to prepare multiple coseismic fault slip models that explain the data well. However, it is not realistic to calculate tsunami inundation for all MCMC samples, even if calculation speed were to improve in the future. Therefore, our algorithm aims to efficiently classify MCMC samples and integrate multiple tsunami inundation scenarios to probabilistically present tsunami inundation risk. Here, assuming that multiple calculations of 10 2 orders will be possible within a time of several minutes in the future, we added "real-time" to the flow name in this study. Figure 3 shows a flow chart of the real-time tsunami inundation risk map process.
i. Obtain sufficient MCMC samples to evaluate the uncertainty of the slip distribution model ("Stepwise partitioning algorithm" section). Samples from which VR was extracted are equivalent to a representative value based on the posterior PDF.
ii. Classify these samples into small clusters (K) using the k-means approach using θ as the feature value. iii. Generate K representative slip distributions (using the median value of each subfault), and use them as inputs to calculate individual K tsunami inundations. iv. Count the number of inundations on the map. The resulting map shows the possible tsunami arrival rate in each computation grid based on the uncertainty of fault slip estimated from real-time GNSS data.
The main feature of our algorithm is the clustering of MCMC samples, whereas generally utilize posterior PDFs. As discussed, the nonlinearity of tsunami inundation calculation is a problem, so we need to obtain sufficient fault models to evaluate the tsunami risk. In contrast, samples using the M-H method are not necessarily independent, because the M-H method proposes samples by perturbing the previous one. We adopted the k-means method (Steinhaus, 1956) to classify the samples into a predetermined number, with the number of clusters K determined by considering the computational cost of tsunami inundation, the availability of denominators for probability display, and the maintenance of the diversity of MCMC samples. The feature value used for clustering is the slip vector θ, because the clustering phase does not depend on the specific target region of the tsunami calculation. Samples with the same VR were used to ensure fairness in terms of reproducibility of the observed data.
Here, the target of the tsunami inundation calculation was a 1563 km 2 area between Tosa City and Aki City in Kochi Prefecture (red box in Additional file 1: Figure S1). Additional file 1: Table S1 lists the conditions of the tsunami inundation calculation. The grid size in the target area was 30 × 30 m, and the number of grids was 1,736,388 (2,082 grids along the east-west direction and 834 grids along the north-south direction). Tsunami propagation and inundation were calculated using the TUNAMI code (Tohoku University's numerical analysis model for investigating tsunami), which numerically solves the nonlinear shallow water equation using the staggered leap-flog finite difference method (Imamura 1995). The tsunami simulation was carried out from the time of the earthquake to 6 h later. The fracture propagation of the earthquake was not considered, and the slip was assumed to be instantaneous. We calculated seafloor displacement using the analytical result of Okada (1992), and the uplift of the seawater due to the horizontal movement of the seafloor was considered (Tanioka and Satake 1996). Coastal structures, such as breakwaters, were input as line data every 30 m and were not breached by earthquakes or tsunamis; buildings were not considered. We used the maximum inundation depth up to 6 h after each calculation grid as the inundated depth.

Uncertainty of slip distribution
We applied the stepwise partitioning algorithm ("Stepwise partitioning algorithm" section) to the simulated data-that is, displacement data at 642 GEONET stations calculated using the analytical result of Okada (1992) with Gaussian noise (standard deviation of 2 cm horizontally and 5 cm vertically). Figure 4 shows the estimation results for four stages: median of the posterior PDFs and the 95% CI which was calculated from the difference between samples located at 2.5% and those at 97.5%. For all stages, the estimated fault model based on the median value well reproduced the observed data. However, the 95% CI was large, especially near the trough axis, and increased from near the land toward the offshore. This suggests that slip was well estimated in regions with large slip, while that offshore had low estimated resolution.
As the number of perturbation groups increased with the progress of the stage, the obtained slip distribution . We divided the Markov chain into four stages (number of steps in each chain): stage 1 ( 3 × 10 6 steps); stage 2 ( 3 × 10 6 steps); stage 3 ( 3 × 10 6 steps); and stage 4 ( 3 × 10 6 steps). a Markov chains of Mw, variance reduction (VR), and Akaike's Information Criterion (AIC). In dotted lines, θ max is adjusted for 5 × 10 4 steps as burn-in, which samples are not shown in this figure. b Posterior PDFs of Mw and VR. The range on the horizontal axis is same with the range on the vertical axis of (a). Inserted values indicate mean, median, and mode (from top to bottom). Inserted vertical red lines indicate the Mw and VR values from the median models in Fig. 4 consisted of slips with shorter wavelengths (Fig. 4). In particular, in stage 4, a large amount of local slip occurred on certain small perturbation groups. Figure 5 shows the Markov chains and PDFs of Mw and VR for all samples except the burn-in. The VR increased with an increase in the model's freedom, which shows that the four-stage sampling successfully estimated models with high VR values without smoothing constraints. However, the amount of increase was small from stage 3 to stage 4. These results suggest that the spatial scale of the slip depended on that of the perturbation groups, because they were estimated without the smoothing constraint. The bottom chains in Fig. 5a show the calculated AIC values for each sample using Eq. (5). The mean values (± standard deviation) of AIC in each stage were 5502 ± 35, 5384 ± 47, 5455 ± 66, and 7572 ± 78, respectively. The AIC values of stage 2 and stage 3 were lower than those of stage 4, which objectively suggests the possibility of over-division. Based on this, we mainly use the samples of stage 2 for further analysis in "Uncertainty of tsunami inundation" section. Comparing the input (Fig. 1), the median slip is located more landward at all stages (Fig. 4). The Mw and VR values calculated based on the median models (inset in Fig. 4; vertical red lines in Fig. 5b) were differed slightly from those calculated based on each sample (PDFs in Fig. 5b). These indicate that the model obtained for each sample does not completely agree with the slip distribution by representative values because of the diversity of the slip in the near-trench area. Figure 6 shows eight examples of posterior PDFs from stage 2. The point A, B and F are the area off Cape Ashizuri, where the estimated slip was large and arranged in the fault dip direction. The 95% CI at point A (close to land) was small, while the value at point B (farther offshore) was large. At point F, located further offshore, the search extends to approximately 10 m, while the maximum frequency value remains at zero. These tendencies were also observed for points C and G. In the case of point E, which corresponds to a region, where the assumed slip gradually decays, the PDF includes the assumed slip (red line in Fig. 6). This diversity of offshore slip suggests a tradeoff of slip in the fault dip direction. The diversity is more pronounced owing to the lack of real-time seafloor observation sites. Ideally, modeling should be done according to spatial resolution (we discuss in "Optimization of perturbation groups for each stage and background subfaults" section). As shown above, because the tradeoff between model parameters and the posterior PDF was not necessarily normally distributed, this study used the median as the representative value (cf. mean and mode). These results, focused on one stage, show that the MCMC can estimate significant slip distributions

Uncertainty of tsunami inundation
We applied the "real-time tsunami inundation risk map ("Real-time tsunami inundation risk map" section)" to the MCMC samples of stage 2, where had the lowest AIC (see "Uncertainty of slip distribution" section). First, among the 3 × 10 6 MCMC samples of stage 2, we extracted 319,184 samples with VR of 99.52%, which is the representative value of the PDF (Fig. 3i). We then classified the extracted samples into 100 clusters using the k-means method. In each cluster, we generated one scenario using the median values for each subfault belonging to the cluster (Fig. 3ii). Using these scenarios as inputs, we calculated 100 tsunami inundation calculations for the area of Kochi Prefecture (Fig. 3iii). Finally, we counted the number of inundated points on each grid on the map (Fig. 3iv). The threshold of the inundated point was 1 cm or more during the 6 h since the earthquake. We did not apply any weights to obtain the frequencies. Figure 7a shows the generated real-time tsunami inundation risk map. The envelope of the maximum inundation area and the arrival probability (risk level) are shown. The area where the risk of tsunami inundation is extremely high in all 100 cases (pink coloring) is similar to the inundation area due to the assumed slip (Fig. 8c). The areas with the highest probability of tsunami inundation are the Uranouchi Bay, Monobe River, and Yasu River. In Fig. 8 Slip distribution, seafloor vertical deformation (initial wave field), and tsunami inundation scenarios (K = 100). a Case of flooding across the largest area. b Case of flooding across the second largest area. c The case of the input slip model that was used to generate the synthetic observations (Fig. 1). Left: Slip distribution (tsunami scenario), for which the slip amount of each subfault is the median of the cluster. Inserted values indicate the Mw and frequency of the cluster and variance reduction (VR) with horizontal and vertical components. Black rectangles indicate the area of seafloor vertical deformation shown in the middle panels. Middle: seafloor vertical deformation calculated from the left slip distribution using Okada (1992). Inserted values indicate the maximum and minimum vertical deformation. Black rectangles indicate the area shown in the tsunami inundation map to the right. Right: Tsunami inundation, where colors indicate tsunami height. Inserted values indicate the inundation area addition, the southern half of Kochi Airport also has a high risk of inundation. These results show that tsunami inundation is strongly related to land topography and rivers. Tsunamis would likely run northward along large rivers (e.g., Niyodo River, Monobe River), and extend their inundation area around rivers and waterways (e.g., Shimoda River). The limit of tsunami run-up on land was defined in some cases by the fact that the kinetic energy was zero, and in other cases, because it could not overcome the difference in elevation of the terrain (rivers, channels, and embankments). For comparison, Fig. 7b shows an elevation map of the same area. High inundation risk is generally distributed in areas with elevations of 10 m or less. On the other hand, tsunami inundation does not spread to Urado Bay, despite the low elevation, owing to the structural conditions in the tsunami calculation. The elevation of area A in Fig. 7b varies gently (from 0 to 8 m), and Fig. 7a shows a clear gradation of 0-100%. Area B has an embankment of more than 1 m and this becomes the inundation boundary. Area C has an elevation of more than 10 m parallel to the coast; in about 50% of the scenarios inundation proceeded from the east side, and in a few scenarios, inundation overcame from the south to the north. Therefore, the results of inundation calculations vary depending on small differences in topographical data and arrival tsunami scenarios. Fig. 9 Clustering using k-means with the slip vector as the feature value. A Elbow method when the number of clusters is changed. b Frequency distribution of each cluster when K = 100 and K = 300. Red and light blue bins indicate clusters with inundation areas of 60 km 2 or more, and 40 to 60 km 2 , respectively. c Correlation between the number of models included in each cluster and the variation of the slip distribution, which is the scalar value obtained by adding the slip ranges of all perturbation faults for each cluster. d Correlation between Mw and variance reduction (VR) of the median model (tsunami scenario) of each cluster. The dotted black line indicates the original VR of all clustered samples Figure 7a shows that there were few areas, where the probability of inundation was 50-90%. This indicates that the tendency of inundation is different between always inundated and rarely inundated areas. Figure 8 shows the cases with large inundation areas. The inundation area of the largest two cases was more than 40 km 2 , which is clearly larger than the assumed slip (16.784 km 2 ). In both cases, the initial wave fields had large uplift and subsidence in the offshore area of Kochi Prefecture. If these two cases were removed from the count, the appearance of the map changed (Additional file 1: Figure S2). In addition, samples with large vertical seafloor deformation for the entire slip area did not necessarily have large tsunami inundation areas (Additional file 1: Figure S3), because tsunami inundation was evaluated on a local scale.
It is also important to consider the method of editing the real-time tsunami inundation risk maps. Additional file 1: Figure S4 shows editing examples, where (a) the map is weighted by the label frequency distribution (Fig. 9b), and where (b) the threshold value of the inundation depth considered to be inundation was changed from 1 cm to 1 m. In Additional file 1: Figure  S4(a), there was almost no change in the frequency tendency of the inundation area, because there was no strong correlation between the frequency and the inundation area. However, as shown in Additional file 1: Figure S4(b), there was a decrease in the inundation probabilities, because inundation of less than 1 m was not counted.
These real-time tsunami inundation risk maps provide quantitative probabilistic information. As the concept of the target area is not included in the scenario classification method (Fig. 3i, ii), the same map can be generated for any region using the same MCMC samples.

Setting optimization in stepwise partition algorithm Optimization of σ ij in the likelihood function
In this study, we assumed σ ij (see Eq. (3)). In contrast, Ohno et al. (2021) automatically adjusted σ ij according to the event using the maximum likelihood estimation scheme. Here, this adjustment was not applied to the slip distribution model, because the large number of parameters easily explained the observed data, resulting in a very small estimate of σ ij . This dynamic adjustment is more effective for a single rectangular fault model; however, modeling errors exist in the slip distribution model owing to the uncertainty of Green's function. Therefore, in this study, we assumed σ ij , which depends on the displacements and observation errors. Ohno et al. (2021) used earthquake early warning values (EEW: latitude, longitude, depth, M) as a priori information. While our algorithm does not use such a priori information, it is important to appropriately utilize such information to achieve accurate and rapid estimation for real-time estimation. For example, the algorithm can use the EEW and independent estimated models, such as the single rectangular fault model and the slip distribution by REGARD, as a priori information.

Optimization of perturbation groups for each stage and background subfaults
As discussed in "Uncertainty of slip distribution" section, there was a tradeoff of slip in the dip direction. We applied a stepwise partitioning algorithm to various assumed slip distributions and found a similar tendency (Additional file 1: Figures S5-S10). The position and wavelength of the slip in the dip direction affected the tsunami calculation. Optimization of the perturbation group settings is especially important, because the empirical geophysical law (slip continuity) is not considered when a smoothing constraint is not used. For example, it is desirable to evaluate the spatial resolution depending on the location of the observation stations and to determine the division of the perturbation group accordingly. Kimura et al. (2019) developed an algorithm for automatically setting up the division of subfaults according to the arrangement of observation stations. It may be effective to use this algorithm for setting subfaults (smallest division) and arranging the perturbation group in each stage. The required size varies depending on the magnitude of the earthquake, and so we should determine the optimum stage objectively (i.e., using the AIC) and judge the convergence (e.g., using Gelman 1996) automatically for real-time utilization.

Optimum number of clusters
The optimum number of clusters K for k-means ("Uncertainty of tsunami inundation" section) should be determined considering the cost of real-time computation, the availability of denominators for probability display, and the availability of a sufficient number of clusters that do not distort the diversity of all samples. Here, we consider the validity of K = 100 in comparison with K = 300. Figure 9 shows the results of clustering using k-means. According to the elbow method graph (Fig. 9a), a cluster number of K = 50-100 seems to be appropriate. On the other hand, according to the frequency distribution of the clusters (Fig. 9b), even if we increased the number of clusters from K = 100 to K = 300, it looks like the classification just was subdivided overall. Figure 9c shows the correlation between the number of models included in each cluster and the variability of the slip distribution, which is a scalar sum of the ranges of slip amounts on each subfault. According to Fig. 9c, both the number of models included (horizontal axis) and variability (vertical axis) are smaller for K = 300 than for K = 100. This may indicate that by increasing the number of clusters, clusters were formed among the slip distribution models that have the closest features to each other. Figure 9d shows the variability of the VRs for the K scenarios (the median model in each cluster). Although the used samples have the same VR (99.52%), those of the classified scenarios vary in the order of the second decimal place. Note that there are some scenarios, where VR is greater than the original VR (99.52%). This is probably because the outlier-like short-wavelength slip of each sample was smoothed by taking the median at each subfault, resulting in a model that better explains the data. Figure 10 shows the correlation between the number of models included in each cluster and the inundation areas for K = 100 and K = 300. The frequency distribution of the tsunami inundation area (vertical axis) is asymmetric, with a peak near the tsunami inundation area due to the assumed slip (16.784 km 2 ) and long tail on the side of the large tsunami inundation. The maximum and minimum inundation areas for K = 100 and K = 300 are similar. On the other hand, 300 cases of K = 300 are distributed so that they filled in the gaps between the 100 cases of K = 100. The maximum inundation area for K = 300 is slightly larger than that for K = 100 (and the maps look slightly different; Additional file 1: Figure S11). These Tsunami inundation calculation results for multiple tsunami scenarios. Correlation between the number of models included in each cluster and the tsunami inundation area. The dotted black line indicates the inundation area of assumed model (Fig. 1). The vertical and horizontal axes on the inserted diagrams indicate the frequency distributions. Inserted vectors indicate the scenarios of Fig. 9 (a, b) characteristics suggest that the larger the number of K, the more the tsunami scenario is subdivided, and the degree of smoothing of extreme slip decreases. However, for the present trial, K = 100 was sufficient to obtain the range of the inundation area.
In the case of K = 100, there was a negative correlation between the inundation area and the number of models included in each cluster (Fig. 10). However, as shown in Fig. 9b where colored the clusters with an inundation area of 40 km 2 or more, there was not necessarily a relationship between the frequency of each cluster and the flooded area in Kochi City. This may be because the number of models included in each cluster shows a tendency for the "entire" slip distribution, whereas inundation was evaluated by focusing on a target area, which seems to be a natural result. On the other hand, samples with extreme slip patterns tend to be classified into clusters with small frequencies, and the possibility that these samples cause maximum inundation in some other areas cannot be excluded.

Utilization of real-time tsunami inundation risk maps
The risk of tsunami inundation is expressed by multiplying hazard (e.g., tsunami inundation) by exposure (e.g., population and buildings) and fragility (e.g., buildings), then dividing it by resiliency (e.g., Wood 2011). The real-time tsunami inundation risk maps evaluate the probability of tsunami inundation for each area, which corresponds to the probabilistic representation of hazards in the above equation.
For example, if we reflect the difference in day/night population (i.e., exposure), we can evaluate the difference in disaster risk depending on the time of day, and if we reflect the difference in resiliency, we can quantify the disaster risk according to the time scale we want to evaluate. In addition, to maximize the benefit of evaluating disaster triggers as probabilities in this study, it is easier to evaluate the risk level if other factors can be treated as probabilities in the same dimension. For easy-to-understand expression, qualitative evaluation may also be useful (e.g., three risk levels of large, medium, and small). Furthermore, if we take the envelope of the maximum inundation area, we can extract the worst-case scenario for the event (Fig. 7a).
As mentioned in "Introduction" section, the realtime tsunami inundation damage estimation system is expected to be utilized as a function of the disaster information system (DIS) of the Cabinet Office for the initial response of the government when a disaster occurs. Our developed framework for the quantitative evaluation of uncertainty in tsunami inundation prediction has the potential to improve this system. The risk level can be expressed as a probability instead of a single inundation prediction. This system has the potential to improve understanding of high disasterrisk regions and to support the decision-making process for the initial response in such areas.

Conclusions
In this study, we developed a new coseismic fault model estimation algorithm using Bayesian statistics to quantitatively evaluate the estimation uncertainty in the real-time estimation of slip distribution using real-time GNSS data. In addition, as an application of the quantified uncertainty, we investigated how the uncertainty of the fault model estimation may affect tsunami inundation prediction and proposed a new method to extract the uncertainty of tsunami inundation prediction.
One of the major problems in the estimation of slip distribution models with MCMC is that they take a long time to converge. To overcome this problem for real-time purposes, we developed a stepwise partitioning algorithm. We divided the entire Markov chain into four stages with different perturbing groups, and used the 95% CI in the previous stage as the amount of perturbation in the next stage. This enabled us to extract multiple models with different spatial patterns that explained the observed data well. We applied the algorithm to the numerical simulation data of the Nankai Trough region (assumed to be the Mw 8.75 1707 Hoei earthquake). The 95% CI of the obtained posterior PDFs increased with distance from the coast to the offshore, indicating the uncertainty in estimating plate boundary slip from onshore GNSS.
We developed a real-time tsunami inundation risk map process to fully utilize the advantages of the stepwise partitioning algorithm, wherein the uncertainty of slip distribution is quantified as a large number of samples. Specifically, we classified the slip distribution models into 100 clusters determined by the k-means method after extracting samples that have the same VR. We calculated the individual tsunami inundation using 100 scenarios. Furthermore, we counted the number of inundations in each grid on the map and displayed them probabilistically. We showed the envelope of the maximum inundation area and the probability of reaching each computational grid in Kochi City, Kochi Prefecture. However, to truly utilize the obtained tsunami inundation risk, it may be necessary to examine the kind of information needed not only from the viewpoint of disaster triggers but also from the viewpoint of society as a recipient.