Bayesian earthquake forecasting approach based on the epidemic type aftershock sequence model

The epidemic type aftershock sequence (ETAS) model is used as a baseline model both for earthquake clustering and earthquake prediction. In most forecast experiments, the ETAS parameters are estimated based on a short and local catalog, therefore the model parameter optimization carried out by means of a maximum likelihood estimation may be not as robust as expected. We use Bayesian forecast techniques to solve this problem, where non-inform-ative flat prior distributions of the parameters is adopted to perform forecast experiments on 3 mainshocks occurred in Southern California. A Metropolis–Hastings algorithm is employed to sample the model parameters and earth-quake events. We also show, through forecast experiments, how the Bayesian inference allows to obtain a probabilistic forecast, differently from one obtained via MLE.


Introduction
The duration of an earthquake is much smaller than the average temporal distance between two earthquakes.Because of this, it is reasonable to treat an earthquake as a point event in time.This assumption may break down for events with very large magnitudes, but since they are rare, the mathematical approximation of a point process can be adopted for the entire seismic catalog.The epidemic type aftershock sequence (ETAS) model, a kind of point process initially devised to delineate mainshock-aftershock sequences, wherein mainshocks, aftershocks, and foreshocks exhibit analogous behaviors in terms of inducing subsequent events, has gained widespread adoption among researchers as the standard benchmark for assessing hypotheses linked to earthquake clustering (Console et al. 2010;Helmstetter and Sornette 2003;Lombardi and Marzocchi 2010;Ogata 1988Ogata , 1998;;Petrillo and Lippiello 2023;Petrillo et al. 2023;Spassiani et al. 2024;Zhuang 2011Zhuang , 2012)).In particular, it is possible to obtain the occurrence rate of an event with magnitude m > m 0 , at the position (x, y) and at a time t as: where the sum considers all previous earthquakes that occurred in a certain region, µ(x, y) describes the back- ground seismicity and K t and K r are constants fixed by imposing the normalization conditions: and The functional form ν(t − t j |p, c) = 1 (t−t j +c) p is suggested by Ogata (1988), whereas, many proposals have been made in the literature for the description of the spatial component.Among these, one of the most popular is certainly the one proposed by Ogata and Zhuang (2006), where s(x − x j , y − y j |δ, q) = [(x − x j ) 2 + (y − y j ) 2 + δ] −q .The last form of the spatial part can be extended introducing also the information about the ruptured aftershock area δ(m) = d10 γ m (Kagan 2002).Hence, implementing the proposed functional forms and the conditions expressed in Eqs.(2, 3) into Eq.(1) we obtain (1) (x, y, t|H t ) = µ(x, y) Therefore, the set of parameters is θ = (p, α, c, K , µ, d, q, γ ) .It is important to note that we are assum- ing independence between magnitude among triggered events (Petrillo andZhuang 2022, 2023).
The fundamental problem is to estimate the parameters of the model that best fit observed experimental data.Hence, in the frequentist statistical framework, the unknown parameter set θ is estimated using likelihood maximization methods (Lippiello et al. 2014;Ogata 1998Ogata , 1983;;Ogata and Zhuang 2006).As a result of this procedure, once the optimal set of parameters is obtained, it can be implemented in the model and managed for a possible forecast task.However, this procedure can lead to severe errors in correct estimation of the parameters.Indeed, one of the causes may be the insufficiency of observed data due to the shortness of the seismic catalog considered: some parameters may demonstrate substantial fluctuations with minor alterations in log-likelihood, and the standard errors of parameter estimates in the ETAS model are notable and cannot be disregarded.Conventional standard error estimates derived from the Hessian matrix are revealed to be inaccurate when the observed space-time window is small (Wang et al. 2010).Lastly, it is necessary to specify that the numerical methods for the optimization of parameters bring with them major biases, such as: the isotropy of the spatial distribution, the infinite spatial kernel and the short-term aftershock incompleteness.Spatial isotropy often holds for small magnitude earthquakes, however for large magnitude events it can lead to an underestimation of the productivity parameter α (Grimm et al. 2022;Hainzl et al. 2008Hainzl et al. , 2013)).Even the infinite spatial kernel can cause an underestimation of productivity due to an unrealistic far trigger impact of small magnitudes (Grimm et al. 2022).Finally, the obscuring of the short-term aftershock is caused by an overlapping of coda waves immediately after the occurrence of a seismic event and can lead to an overestimation of Omori parameter c and an underestimation of productivity parameter α (Hainzl 2022;Petrillo and Lippiello 2020;Seif et al. 2017).
To overcome these problems, alternative methods to the maximum likelihood estimation (MLE) can be used, such as in Seif et al. (2018) for a systematic study on foreshocks activity or in other epidemic-like models where it is difficult to calculate the likelihood function (Petrillo and Lippiello 2020). (4) A further approach that is on the same line as MLE is certainly the Bayesian optimization of the model parameters.The Bayesian approach to parameter sampling allows bypassing many of the MLE problems by taking into account the uncertainty of the parameters to be estimated (Omi et al. 2015;Shcherbakov 2014) and avoiding simulation biases.In this case, the parameters are considered as a random variables.In practice, the Bayesian estimation process does not return a vector of parameters θ , but a distribution, given by the training set available Y. Before the observations, one assigns an a priori probability distribution π( θ) to each parameter θ i on the basis of previous knowledge.After the observations, a correction of the judgment can be done and one assigns an a posteriori probability distribution for each parameter.In the case that there is no previous knowledge about the prior parameter distribution, it can be chosen as non-informative.In practice, parameters are drawn from a flat distribution within a given range determined by mathematical constraints.
Indicating by P(•) the general operator for some ran- dom variable, the "a posteriori" distribution P( � θ |Y ) , i.e., the probability of observing a certain set of parameters θ given a certain observation data Y, can be written, thanks to Bayes' theorem, as: where P(Y | � θ) is the probability to observe a set of data Y given a certain set of parameter θ , namely, the likelihood function, π( θ) is the prior distribution and P(Y) is the probability of observing Y without any condition.Therefore, by obtaining the P( � θ |Y ) distributions of the model parameters, bringing with them all the uncertainty due to their estimation, a forecast protocol can be implemented.
Obviously, from a practical point of view, carrying out this procedure analytically could be complicated.For this reason, it is possible to adopt numerical protocols, as Markov-Chain Monte Carlo (MCMC), to solve the proposed problem (Ross 2021).
Recent approaches to Bayesian parameter estimation have been successfully implemented to perform seismic forecast tests.As an example, in Holschneider et al. (2012) the authors show the validity of the modified Omori law both in the short and long term by means of a Bayesian setting.Moreover, in Ross (2021); Ross and Kolev (2022) Ross optimizes the parameters of the ETAS model with a Bayesian approach through the latent variable method.Another interesting Bayesian approach has also been applied to the Italian catalog demonstrating the methodology by retrospective early forecast of seismicity associated with the 2016 Amatrice seismic sequence (5) activities in central Italy (Ebrahimian and Jalayer 2017).Furthermore, Molkenthin et al. (2022) introduces a non-parametric representation of the spatially varying ETAS background intensity through a Gaussian process prior with which they are able to obtain the "a posteriori" distributions of the parameters presenting results on L' Aquila (Italy) event as a case study.Finally, a very interesting approach has been used by Shcherbakov et al. (2019) for forecasting the magnitude of the largest expected earthquake by combining the Bayesian method with the extreme value theory.
Although much effort has been made, the Bayesian estimation in statistical seismology remains an approach yet to be fully explored.In fact, the Bayesian estimation in Ross (2021) and consequently the forecast experiment are obtained through the use of latent variables justifying the use of this method raising doubts that the direct procedure through MCMC of the parameters works correctly.
In this paper, we carry out an illustration of Bayesian forecast on the aftershocks of 3 mainshocks with magnitude greater than 7 contained in the Southern California Catalogue (Hauksson et al. 2012).Bayesian parameter optimization is performed directly and not through the use of hidden variables.The Likelihood calculation is carried out with the innovative method proposed by Lippiello et al. (2014) which allows an accurate, stable, and relatively fast parameter inversion.This allows us to bypass the use of latent variables and to avoid bias on the estimation of the parameters due to the incorrect numerical estimation of Likelihood.To strengthen the validity of Bayesian forecast, we plot Molchan diagrams and calculate forecast scores by means of S-test and L-test.We would like to underline that, by means of this conceptually simple forecast experiment it is possible to confirm and strengthen all the results in literature on Bayesian forecast.

Likelihood and Bayesian estimation
Defining and [t in , t fin ] as the spatial and the temporal region, respectively, the log-likelihood function for the ETAS model (LL) is given by: There is no difficulty in computing the first term, however, approximations are needed to solve the integral.For example, Schoenberg (2013) assumes that both t fin and the spatial area tend to infinity.As shown in Lippiello et al. (2014), however, these approximations can be too crude leading to bias in the estimation of the parameters.( 6) In fact the limit t fin → ∞ does not represent a signifi- cant advantage from the computational point of view, whereas → ∞ introduces biases for parameter estima- tion.In particular, it is easy to solve the integral in polar coordinates, but the shape of the region usually breaks the circular symmetry.For this reason, we use the novel method proposed by Lippiello et al. (2014) for the computation of the likelihood in Eq. ( 6).LL clearly depends on the parameters of the model and in the MLE frequentist approach, if a sufficiently large parametric region is explored, maximization can be achieved.In the Bayesian framework, we are interested in the a posteriori distribution P( � θ|Y ) .Defining π( θ) as a flat prior distribution of a certain parameter, in this study we employ for the MCMC method a component-wise Metropolis-Hastings (MH) algorithm (Hastings 1970;Metropolis et al. 1953).The MH algorithm produces a Markov chain on each parameter of our model through the following rules: The decision to accept or reject the value of the new parameter is made computing the so-called Metropolis-Hastings acceptance ratio (MHAR): where P(Y |θ 1 , ..., θ ′ i , ..., θ N p ) is the likelihood of the observed sequence and N p is the total number of the parameters, in this study chosen as N p = 8 .The new candidate θ ′ i will be accepted with probability min(1, ρ(θ ′ i , θ i )) , i.e., the algorithm accepts new values θ ′ i only when the new likelihood is greater than the previous value, but an important feature is that the new proposed value may be accepted also if the ratio is decreasing, similar to stochastic optimization methods.It can be shown (Beck and Au 2002;Ebrahimian and Jalayer 2017) that, if θ n is distributed as P(•|Y ) also the following sample is distributed as P(•|Y ) .In fact, the probability density function of the θ n given Y can be written as: where the calculation was performed using the equality min(1, a/b)b = min(1, b/a)a .We empirically tune the standard deviation of ǫ distribution in order to obtain an acceptance rate around the optimal 45%.This ( 7) corresponds to σ ≃ 0.1 .The choice of the ǫ parameter can be crucial both because if it is too large only a small part of the proposed test values will be accepted, and because if it is too small, the Markov chain will move very slowly and not all parameter space could be explored.The sampled values exhibit a lack of independence, i.e., each sampled value is correlated with its predecessor.This correlation arises from the proposed distribution being centered at the preceding value whenever a new value is proposed.As long as the prior value influences the likelihood of the proposed values, consecutive values remain dependent.To mitigate this autocorrelation phenomenon, one can plot an autocorrelation plot and determine the number of iterations required to minimize this effect.Therefore, in our sampling process, we never consider subsequent samples, but rather propose a sample only after a lag of 10 samples.Furthermore, we perform a 10% burn-in of the initial random walk.This procedure is iterated until the desired sample of posterior distribution P(θ |Y ) is obtained.In this study, we perform n mc = 10 6 Monte Carlo steps.We would like to emphasize that a series of numerical tests has been conducted to evaluate the stability of parameter optimization, with a particular focus on the influence of the initial values of θ .The optimization process appears sta- ble; however, a critical point identified is the distance of the initial parameter values from their optimal values, which slows down the convergence of the algorithm.This phenomenon necessitates an increased computational effort and time to achieve the desired optimization state, underscoring the importance of the careful selection of initial parameters in the optimization process.

Forecast
The Bayesian inversion of the parameters has an advantage in statistical seismology, especially with regard to forecast techniques.The issue of selecting a specific spatio-temporal region for evaluating the log-likelihood through MLE introduces non-negligible bias to parameter estimation.Even if one confines the assessment of to the temporal interval [t in , t fin ] with t in > t 0 , events occurring in [t 0 , t in ] have contributed to triggering sub- sequent events at times and thus must be incorporated in the evaluation of the LL.Similarly, events occurring outside the finite space are regarded as potential triggering events and are included in the assessment of the LL.The spatial bias, in particular, is discussed by Harte (2012).However, this is not the only issue.In general, in the MLE, the correct calculation of the standard errors of the ETAS parameters through the Hessian matrix of the log-likelihood function is guaranteed to be valid asymptotically for an infinite space-time window (Ogata 1978).
This asymptotic limit cannot be obtained with current seismic catalogs and the errors calculated through the Hessian matrix can be strongly biased.To avoid this problem, numerical approaches for error estimation can be used, for example, by using the root-mean-square of the errors in the parameter estimates for the simulations.However, due to the limited number of simulations performed, the sample from which to extrapolate the standard error can be small and therefore bootstrap procedures are usually applied.All these operations are computationally very demanding (Fox et al. 2016;Wang et al. 2010).Conversely, using the Bayesian inference, it is easy to take into account the intrinsic uncertainty of the model parameters, avoiding being too confident about a seismic forecast.
To perform a forecast with an ETAS model, suppose we have observed earthquakes in a space-time interval 0 × [t 0 in , t 0 fin ] and we want to predict the seismic occur- rence in a future space-time interval F × [t F in , t F fin ] .The Bayesian forecast distribution is obtained marginalizing the distribution of Y F given θ over the posterior distribu- tion P(θ |Y ), that is but using Monte Carlo procedure, one can write In practice, with each set of parameters, { � θ n |n ∈ N} , gen- erated from the posterior distribution by means of the MCMC sampler, it is possible to simulate an ETAS synthetic catalog from Eq. ( 4).The number of numerical catalogs obtained will be exactly equal to the number of steps performed by the Monte Carlo procedure, namely where C k represents the kth synthetic catalog containing the information about the occurrence time, the location and the magnitude of an event.Since we are interested in predicting the number of earthquakes, we count the number of events produced by each single synthetic catalog in the space-time window of interest: The details of the generation tree algorithm for simulating each individual catalog are described in the Section Methods.Each catalog i produced by each Monte Carlo step will predict a certain number of earthquakes N fore i in the forecast space-time region of interest.
The statistical test for earthquake number is the most simple conceptually and it is intended to measure how well the earthquake forecast number of events matches the observed one ( N obs ).In order to understand if N obs fall in one of the tails of the distribution it is possible compute the quantile δ under i defined as the fraction of N fore i smaller than observed number of events N obs and δ over i as the fraction of N fore i greater then N obs (Schorlem- mer et al. 2007): the model is consistent with the observations if N obs is in the center of the distribution N fore i .

Optimization and forecast results
An earthquake catalog is defined as a list of experimental earthquakes recorded in a certain geographical region and in a particular period.We consider the relocated southern California catalog (from 1981/01/01 to 2011/12/31) (Hauksson et al. 2012).For the optimization of the parameters, it is necessary to choose a period of time.In our algorithm, to reduce the parameters evaluation biases as in Harte (2012), we define three different times: the initial time t 0 , which represents the begin- ning of time span of the catalog, the learning start time t in , and the end learning time t fin .We clearly have that t 0 ≤ t in < t fin .For the spatial part, we consider certain region ∈ 0 .In general, for the optimization of the parameters, one can evaluate at time t > t in inside a certain region ∈ 0 .
In Fig. 1, we present the correlation matrix among all the parameters of the model.It can be easily seen that the implementation of the burn-in of the first steps of the Markov chain, allows not to have anomalous drifting effects between the parameters.Figure 1 is a reference, in fact, depending on the parameterization used in Eq. (1) it is possible to observe or not observe correlations between the parameters of the model.In fact, for the temporal ETAS, other types of parameterization for the occurrence rate of aftershocks can be K 0 = K t K , leading to ν(t) = K 0 (t + c) −p 10 αm as proposed by Ogata (1998).However, the latter parameterization inherently includes correlations between parameters, which can complicate their estimation when employing MCMC techniques.As an example, in Fig. 2 we show the correlation between the parameters K and p of the model.In choosing to parameterize the ETAS model with separate variables for K t and K, we note an absence of correlation between these two parameters.Figure 2 also shows more clearly the non-existence of the starting transient of the random walk that have been eliminated from the chain.In Fig. 3 the posterior distributions of the parameters, including its average and standard deviation are presented: it can be seen that all 8 distributions are gaussian shaped, consistent with what showed in Ross ( 2021) by means of the latent variables approach.In addition, the MHAR of the algorithm is also shown in the same figure.The point where the curve becomes stationary gives an indication of the burn-in point of the data.The value of the acceptance ratio is particularly important and lot of proposals for the target variance ratio are proposed in the literature (Besag and Green 1993;Besag et al. 1995).We find empirically that, to obtain MHAR ≃ 0.45 , σ must be set around 0.1.
In order to evaluate the effectiveness of the model with the parameters estimated by means of Bayesian inference, we carry out a retrospective forecast after the 1992 Landers earthquake, the 1999 Hector Mine earthquake and the El Mayor-Cucapah earthquake (sometimes called "Baja California earthquake").In particular, the last mainshock occurred on April 4th of the year 2010, it was centered in Baja California and felt also throughout Southern California, Arizona and Nevada.The  spatial localization of the events, in the time-window of one month after each earthquake, is shown in Fig. 4. The observation history comprises includes one month prior to each event, i.e., we consider a distinct learning period (one month) for each sequence under consideration.Consequently, the posterior distribution of the parameters utilized for Bayesian forecast may vary slightly.From a purely qualitative point of view, in Fig. 5 the instrumental catalogs one month before and after each of the three earthquakes considered in the study is plotted.In order to quantitatively assess the quality of the model, for each set of parameters obtained by Bayesian inference, we count the total number of the observed earthquakes from the catalog produces by the ETAS model, N fore,ETAS i and from the uniform model N fore,UNI i The experimental catalogs present missing events immediately after the occurrence of high-magnitude earthquakes.It is now well established that observed incompleteness can be attributed to the overlapping of coda waves which causes a blind time where small events are not detected (de Arcangelis et al. 2018;Hainzl 2016;Helmstetter et al. 2006;Lippiello et al. 2016Lippiello et al. , 2019)).In order to minimize the effect of shortterm aftershock incompleteness, we test a forecast restricted to earthquakes with magnitude greater than a minimum magnitude m L .In Fig. 6a, b, the Bayesian ETAS model number of earthquakes predicted, together with the Poissonian model number and the standard, non-Bayesian ETAS model, is plotted for m L = 4 and for m L = 5 , respectively, for the 1992 Landers earth- quake.Similar results can be found in Figs. 7, 8 for 1999 Hector Mine and Baja California event, respectively.We did not consider lower magnitudes smaller than 4 for the forecast due to the retrospective very stringent assumptions on the completeness magnitude m c the completeness magnitude is assumed m c = 3 for whole sequence).For example, for the Baja California event, this is justified by a lack of seismic stations at south of the border (Hauksson et al. 2010).It can be noted that the forecast distribution by the Bayesian ETAS model is more spread out (less uncertain) then its standard ETAS counterpart since it counts the uncertainty of the model parameters.The forecast of the Poissonian model strongly underestimates the total number of earthquakes; conversely, the Bayesian ETAS model is consistent with the true number of earthquakes that occurred.Regarding the comparison with the standard ETAS model, we note that both models equally underestimate the number of aftershocks for the Landers earthquake but underperform compared to the Bayesian ETAS model for the other two mainshocks.This discrepancy may be attributed to the fact that the uncertainty in forecast for the Bayesian model is strictly related to parameter estimation, leading to a more realistic standard deviation in the distribution.Conversely, the standard ETAS model derives its intrinsic randomness from the model's stochasticity, despite the fixed values of the parameters.
A sanity check of successful optimization of the spatial parameters for the ETAS model is also performed.In Figs. 9, 10, 11a, we show a comparison between the observed events and those predicted by a single ETAS realization, considering only earthquake magnitude greater than 4, always one month after the occurrence Fig. 4 The location of the earthquakes of the 2010 Baja California a, 1999 Hector Mine b and 1992 Landers, one month after the event took place.The size of the circles is related to the magnitude of the plotted event, while the color indicates the occurrence time of the events since the mainshock is occurred, as shown in the legend.The plot is obtained using Python from basemap toolkit (Hunter 2007) of the each mainshock and in the spatial window considered for each event.Small inconsistencies observed in the spatial distribution of events are due to the nonisotropicity of the spatial kernel of the ETAS model.Also, in Figs. 9, 10, 11b, we show a cloud forecast (10000 ETAS catalogs) for the same spatio-temporal period.Regarding the spatial predictability test, we plot the Molchan error diagrams of the location forecasts for the Bayesian ETAS and the non-homogeneous Poisson model in Fig. 12.The Molchan diagram, well defined also in Zechar and Jordan (2008), represents a diagram designed for evaluating earthquake forecast ability and show the fraction of missed earthquakes versus the fraction of space occupied by the observation.In particular, discretizing the target space in J cells of dimension [0.1, 0.1] and calling N (C j ) the number of observed events in the cell j, the alarm rate (AR) is defined as: whereas the missing rate (MR) can be written as: where j represents the occurrence rate in the j cell, th the alarm threshold and I is the logic function, namely, I(A) = 1 if statement A is true and, otherwise I(A) = 0.
The line MR = 1 − AR in Fig. 12 represents the unskilled alarm and can be noted that the Bayesian ETAS model outperform the non-homogeneous Poisson model.

S-test and L-test
To evaluate the likelihood of observed events given the forecast, we carry out the L-test Zechar (2010).To complete the test, we produce a set of synthetic catalogs � k = {ω k (i, j)} , where ω k (i, j) represents the number of simulated earthquakes in the bin (i, j) in the synthetic catalog k.For each k , we compute the joint log-likelihood L k obtaining a distribution.Comparing the observed joint log-likelihood with the simulated joint log-likelihood, one can understand if the observation is consistent with the proposed forecast.Numerically, it is possible to compute the quantile score γ defined as: where |{.}| represents the number of the elements in the ensemble {.} .If the quantile score falls in the critical region (defined by α ), the observation is inconsistent with the forecast.Differently, the S-test isolates the spatial component of the forecast and test the consistency of spatial rates with observed events.For this purpose, one has � = { i ω(i, j)} , and � = { N obs N syn i (i, j)} , where the sum extends over all magnitudes and N obs , N syn are the number of observed events in the instrumental catalog and synthetic catalog, respectively.As in L-test, one can compute observed joint log-likelihood S = L(�|�) and compare this value to the distribution of the simulated joint log-likelihood as in Eq. ( 13) is greater than the significance α = 0.025 , then the observed spatial distribution seismicity is consistent with the forecast.
In our simulation, both the likelihood of observed events given the forecast and the spatial distribution of the seismicity are consistent with the forecast.The results are presented in Table (1).

Background seismicity
In Fig. 13, the background activity of the instrumental catalog is compared against the one obtained by means of the ETAS model.The probability that an event belongs to the background is calculated by means of the method developed by Zhuang et al. (2002) with which it is possible to compute the probability that a certain event i of the catalog belongs to the background seismicity.In particular, represents the probability that the ith event is a background event.Employing a similar procedure, it is ( 14) possible to obtain the probability that an event is triggered by a previous one.In our case study, the seismicity of background shows prominent geographical trends in the northeast and northwest directions as confirmed by Hauksson et al. (2010).Quantitatively, from the ETAS simulations we obtain an average value of μ = 1 × 10 −6 , while from the sto- chastic declustering on the instrumental catalog a value of 1.3 × 10 −6 which confirms the correct Bayesian inver- sion procedure of the parameter µ.

Discussion and conclusions
In the last few decades, the now extensive literature on epidemic models applied to statistical seismology has confirmed the strong ability of the ETAS models to incorporate and capture the occurrence of earthquakes and thus become a baseline for seismic forecast (Jordan et al. 2011).The predict procedure through the use of these models, however, always involves the evaluation of log-likelihood.Unfortunately, the computation of the quantities that form the log-likelihood is computationally very demanding and also contemplates numerical approximations due to the lack of an exact analytical solution of the integral in Eq. ( 6).In fact it has been shown (Lippiello et al. 2014) how the limits t → ∞ and → ∞ can lead to strong bias in the evaluation of the integral.
From this point of view, with the Bayesian inference of the parameters, not only is it possible to bypass most of the problems related to the intrinsic uncertainty of the parameters, but it is also possible to avoid any bias due to the "art of simulation selection".In fact, many efforts are being made on this line of research by estimating the parameters of the ETAS model with the Bayesian method (Ebrahimian and Jalayer 2017;Ross and Kolev 2022;Ross 2021;Shcherbakov 2014).
In this article, we carried out a retrospective forecast experiment performed after 3 major shocks of magnitude greater than 7 that occurred in Southern California, such as the 1992 Landers, the 1999 Hector Mine and the 2010 El Mayor-Cucapah earthquake (Baja California).A Bayesian inference to a spatio-temporal ETAS model with 8 free parameters by means a Metropolis-Hastings algorithm is carried  out.To assess the model optimization, we provide the Molchan diagram as a spatial predict test as well as a cloud forecast overlaying 10000 ETAS simulations.Other quantitative forecast scores, such as S-test and L-test are performed to strengthen the validity of the method and the prediction.In fact, the results show a good agreement between the predicted value and the actually observed one.The variability of the forecast distribution for the number of earthquakes allows a much less confident and more cautious forecast respect, for example, by means a direct estimate of the parameters using the MLE approach.
There are many technical points could be improved in our procedures.For example, we can introduce a latent variable for each event to identify its origin, as done in Ross (2021); Ross and Kolev (2022), or use Hamilton dynamic to fasten the MCMC sampler [e.g., Neal (2011)].Furthermore, the background rate could be sampled as a Gaussian field (Molkenthin et al. 2022) or a Dirichlet process (Ross and Kolev 2022).Nevertheless, this direct Bayesian approach with trivial priors works well to produce stable forecasts of Southern California aftershocks.

Appendix A: Log-likelihood estimation
The log-likelihood function for in a spatio-temporal region × [t in , t fin ] is given by: with (15) LL = LL 1 − LL 2 ,  In general, the computation of LL 1 is straightfor- ward.Conversely, in order to evaluate LL 2 , it is nec- essary to perform some approximations.Schoenberg (2013) assumes that both t fin and the size of the testing area tends to infinity.Under these approximations, LL 2 reduces to However, these approximations appear to be too crude and may bias the likelihood estimate (Lippiello et al. 2014).The temporal integral can be solved ( 16) The most challenging part is the resolution of the spatial integral.The difficulty for resolving it, in particular, is related to the fact that it is easy to solve the integral in polar coordinates, but the form of the region usually breaks the circular symmetry.Each epicenter is radially connected to a finite number of sampled points on the edge of the region.Then, the area of between two successive radii is approximated with a circular sector.
A good approximation can be obtained by getting many points on the edges, but this would lead to a high computational time.
To bypass this problem, we use the method proposed by Lippiello et al. (2014) which takes into account that the spatial integral is a rapidly decreasing function of the distance.In particular, for each earthquake it is possible to approximate the area of the target region using concentric annuli centered at the epicenter of the event (x i , y i ) and with a certain width r .Therefore:

Fig. 1
Fig. 1 The correlation matrix (bottom triangular table), together with the matrix contour plot (upper triangular table) among the parameters of the model considered in this study.c is measured in days, d in deg 2 , whereas µ is expressed in sec −1 deg −2

Fig. 2
Fig. 2 The correlation between parameters K and p

Fig. 3
Fig. 3 From a to h, posterior distributions for the eight parameters of the ETAS model are defined in Eq. (1) obtained simulating 1,000,000 Monte Carlo steps.The parameter c is measured in days, d is measured in deg 2 , whereas µ is expressed in sec −1 deg −2 .The last plot at the bottom right represents the acceptance rate calculated with the MH acceptance ratio

Fig. 5
Fig. 5 Instrumental seismic catalogs one month before (left panels) and after (right panels) for a 1992 Landers b 1999 Hector Mine and c 2010 Baja California earthquake

Fig. 6 (
Fig. 6 (Upper panel) Forecast distribution for the number of earthquakes with magnitude m ≥ 4 and (lower panel) for the number of earthquakes with magnitude m ≥ 5 occurred 1 month after the 1992 Landers earthquake.The spatial window is [( − 119.26, − 114.14),(31,37.95)].Black histogram represents the Bayesian ETAS forecast, green histogram the Stationary space-time Poissonian forecast, the magenta histogram the standard ETAS model, whereas the red line represents the observed number of earthquakes (138 for m ≥ 4 and 14 for m ≥ 5)

Fig. 7 (Fig. 9 a
Fig. 7 (Upper panel) Forecast distribution for the number of earthquakes with magnitude m ≥ 4 and (lower panel) for the number of earthquakes with magnitude m ≥ 5 occurred one month after the 1999 Hector Mine earthquake.The spatial window is [(-115,-117), (32, 36)].Black histogram represents the Bayesian ETAS forecast, green histogram the Stationary space-time Poissonian forecast, the magenta histogram the standard ETAS model whereas the red line represents the observed number of earthquakes (60 for m ≥ 4 and 2 for m ≥ 5)

Fig. 10 a
Fig.10a Earthquakes observed (680 events, black circles) and predicted (607 events, red squares) by a single realization of the ETAS model.b Earthquakes observed (648 events, magenta x) and 10000 realizations of the ETAS model (green dots).The total observation period is one month after the occurrence of 1999 Hector Mine earthquake and in a spatial window[(-115,-117),(32,36)]

Fig. 11 a
Fig.11a Earthquakes observed (648 events, black circles) and predicted (609 events, red squares) by a single realization of the ETAS model.b Earthquakes observed (648 events, magenta x) and 10000 realizations of the ETAS model (green dots).The total observation period is one month after the occurrence of 2010 Baja California earthquake and in a spatial window[(-122, − 113.5),(31, 38)]

Fig. 13
Fig. 13 Background seismicity for a 1992 Landers, b 1999 Hector Mine and c Baja California in the space-time window in the case study.The black triangles represent the background activity observed from the experimental catalog, while the red crosses are the background events obtained from the synthetic ETAS one

Table 1 L
-test and S-test quantile for the three earthquakes analyzed in the paper for m L > 4 LL 2 ≃ (t fin − t in ) t in ,t fin ] log (x i , y i , t i ), LL = � d� t fin t in (x, y, t)dt.(17)�d�µ(x, y) + K i 10 α(m i −m 0 ) ., for t j ≥ t in .