 Full paper
 Open Access
 Published:
Combining earthquake forecasts using differential probability gains
Earth, Planets and Space volume 66, Article number: 37 (2014)
Abstract
We describe an iterative method to combine seismicity forecasts. With this method, we produce the next generation of a starting forecast by incorporating predictive skill from one or more input forecasts. For a single iteration, we use the differential probability gain of an input forecast relative to the starting forecast. At each point in space and time, the rate in the nextgeneration forecast is the product of the starting rate and the local differential probability gain. The main advantage of this method is that it can produce high forecast rates using all types of numerical forecast models, even those that are not ratebased. Naturally, a limitation of this method is that the input forecast must have some information not already contained in the starting forecast. We illustrate this method using the Every Earthquake a Precursor According to Scale (EEPAS) and Early Aftershocks Statistics (EAST) models, which are currently being evaluated at the US testing center of the Collaboratory for the Study of Earthquake Predictability. During a testing period from July 2009 to December 2011 (with 19 target earthquakes), the combined model we produce has better predictive performance  in terms of Molchan diagrams and likelihood  than the starting model (EEPAS) and the input model (EAST). Many of the target earthquakes occur in regions where the combined model has high forecast rates. Most importantly, the rates in these regions are substantially higher than if we had simply averaged the models.
Background
Despite a growing number of reasonably reliable and skillful numerical seismicity forecast models, operational earthquake forecasting remains a daunting challenge. One of the fundamental difficulties is that operational forecasts require high expected earthquake rates to make substantial decisions (e.g., evacuation or other emergency actions), but the probabilities derived from statistical seismicity models are still quite small (Jordan and Jones 2010). One potential approach to this problem is to combine models in a way that maximizes overall predictive skill.
It is well known that combining many models or classifiers that describe the same data may yield higher performances than any individual member. These ensemble learning techniques include methods such as Bayesian model averaging (Marzocchi et al. 2012) and, in the context of classifiers, boosting (Hastie et al. 2008). In the first case, the combined model is a weighted sum of individual posterior probabilities, the weights being new parameters that can be learned from the data. For boosting, weighted data are used to train a collection of classifiers; during iteration, previously misclassified data get higher weight. These approaches are similar in that the combined models or classifiers are of the same abstract nature, meaning that each gives a statistical description of the same data set. In our approach, we combine models of different natures. For example, alarmbased and ratebased forecast models both give a description of the seismicity in a region. However, alarmbased models do this through a spacetime alarm function, generally not normalized, whereas ratebased models give a description in terms of spacetime Poisson event rates. The combination of two such models cannot be cast into the classical form used in ensemble techniques. Furthermore, the vast majority of current seismicity models are based only on catalogs of past earthquakes, and there is some hope that additional geological, geodetic, and physical information could improve forecast performance. Combining such information should be done in a complementary fashion so as not to increase uncertainty and thereby degrade forecast performance. This challenge cannot be met by traditional methods, and a new method to combine forecasts should identify what additional information a model can contribute to an existing forecast.
It is often difficult to verify the presence of additional predictive information in an earthquake forecast model, but researchers are attempting to address this problem with the Collaboratory for the Study of Earthquake Predictability (CSEP). In CSEP testing regions around the world (e.g., California, Italy, New Zealand, the western Pacific, Japan, and the globe), various forecast models are evaluated in a standardized way (Jordan 2006; Gerstenberger et al. 2007; Zechar et al. 2010a; Zechar and Jordan 2010; Zechar et al. 2010b; Rhoades and Gerstenberger 2009; Nanjo et al. 2011; Tsuruoka et al. 2012; Eberhard et al. 2012; Taroni et al. 2014). One subtle benefit of these centers is that all forecasts are systematically archived. Therefore, one can test methods of combination using archived prospective forecasts. Most forecasts within CSEP testing centers are ratebased forecast models with a time step of 1 day, 3 months, or 5 years, and testing regions are gridded with square cells of 0.1° and a class interval of earthquake magnitude of 0.1 from M≥3.95 earthquakes. For a much longer period, several alarmbased models have been developed and tested by various research groups. Known examples are the global and regional tests of M8, CN, and RTP (KeilisBorok and Kossobokov 1990; KeilisBorok and Rotwain 1990; Peresan et al. 1999; Shebalin et al. 2006; Romashkova and Kossobokov 2004; Zechar 2010).
Researchers have suggested a few methods for combining models and/or earthquake precursors. For a set of ratebased models, a weighted average is a natural solution (Rhoades and Gerstenberger 2009; Marzocchi et al. 2012; Rhoades 2013). In the current implementation of such approaches weights do not depend on space; rather, they are chosen according to a relative performance of the model observed during some testing period. A direct product of functions describing precursory behavior was used in the RTL prediction algorithm (Sobolev et al. 1996). In this case, even if the initial functions are probabilistic, the output is a nonprobabilistic alarmbased model. Another way to combine models is by using Bayes’ formula for conditional probabilities (Sobolev et al. 1991). However, when using this approach, it is difficult to take into account the interdependence of the combined elements, and resulting estimates are hardly probabilistic.
Shebalin et al. (2011) suggested a method based on differential probability gains to convert alarmbased to ratebased earthquake forecast models. Thus, nonprobabilistic forecast models or seismic precursor maps can be converted to probabilistic ratebased models. In this article, we generalize this differential probability gains approach to combine all types of timevarying forecasts.
Methods
Evaluation of forecast models
Following the CSEP standards, we discretized forecasts in space and time according to a predefined grid and a given time step. In prospective tests, all forecasts are given for the next time step and a finite magnitude range.
Molchan tests
Molchan tests are used to compare an alarmbased model with a reference model of seismicity defined on the same spatial grid (Molchan 1990). For any spacetime region (x,t), the reference model provides the rate λ(x,t) of target earthquakes. The alarmbased model is entirely defined by its alarm function A(x,t). Where this alarm function exceeds a given threshold value A_{0}, an alarm is issued and a target earthquake is expected to occur. Although it is not necessary, it is usually assumed that A values are ordered from smallest to largest according to the probability of occurrence of a target event. In almost all cases, numerical forecast models can be easily converted to an alarmbased forecast because the information provided by a numerical value assignment on a given spacetime grid can be used as an alarm function.
A Molchan test takes the form of a diagram comparing rates of types I and II errors for varying threshold values A_{0} (i.e., a level of alarm). For different A_{0} values, type I errors are measured by
where the sum symbols refer to the spacetime regions in which the subscript condition is satisfied. The τ value is often interpreted as a fraction of the spacetime region occupied by alarms (Kossobokov and Shebalin 2003; Molchan 2010). Here, it is important to emphasize that this fraction is given by a reference model that may depend on time. Type II error rates are the miss rates ν(A_{0}), the fraction of target earthquakes that occurred in spacetime bins in which A<A_{0}.
In a Molchan diagram, the (τ, ν) curve constructed for all A_{0} values is called the Molchan trajectory (Molchan 1990; Zechar and Jordan 2008). This trajectory runs from the point (0, 1) to the point (1, 0) for a decreasing A_{0} value. The diagonal connecting these two points corresponds to an unskilled forecast. Below this diagonal, the alarm function may bring additional predictive power (Shebalin et al. 2006) with respect to a given level of significance α (Zechar and Jordan 2008, Equations 2 and 3). The closer the Molchan trajectory is to the y axis, the more skill the forecast has. It is often desirable to characterize forecast skills by a scalar value, the socalled loss function. Examples of loss functions are the minimal summary error (Molchan 1990, max(1−τ−ν)), the minimax loss function (Molchan 1990, inf(max(ν, τ))), the area above the Molchan curve (Zechar and Jordan 2008; 2010), and the maximal probability gain (Aki 1996, max((1−ν)/τ)). If there are a large number of target events, we can also suggest here the targetweighted probability gain (i.e., max((1−ν)^{2}/τ)). Note the singularity at τ=0 for many of these expressions. Furthermore, there is always a tradeoff between the rates of false alarms and failures to predict so that the best scalar value may depend on the goal of the forecast (Molchan 1990). In all cases, one should plot the twodimensional Molchan diagram to visualize the vector data that are processed.
Likelihood tests
Likelihood tests are commonly used to evaluate ratebased models of seismicity. Their forecasts are tested against the number ω(x,t) of observed target earthquakes in each bin (Schorlemmer et al. 2007). For simplicity, individual rates are assumed to follow independent Poisson processes. For all magnitude classes, the complete likelihood counts the Poisson joint loglikelihood of the observed number ω(x,t) given the forecast λ(x,t):
The closer the joint loglikelihood is to zero, the better the forecast is.
The spatial likelihood is a reduction of the complete likelihood applied to a forecast with rate value normalized to match the total observed number of targets (Zechar et al. 2010a). In each spatial bin, the single rate values are obtained by summing the expected event rates over the whole range of magnitudes (Kossobokov 2006). For the total duration of the experiment, the total sum of log likelihoods over all time steps can be calculated and divided by the total number of observed events to estimate a log likelihood per earthquake.
Likelihood tests applied to forecasts defined on a highdensity spatial grid are often criticized because of potential earthquake interactions (Molchan 2012). However, the problem of bin independence cannot be solved easily and it is generally thought that the dependence is conditional on earthquake occurrence. For example, we expect many aftershocks after large earthquakes, but a prospective forecast experiment requires any interbin dependence to be provided in advance, before one knows about the large earthquake (Zechar 2010).
Testing forecast models at CSEP
Likelihood and Molchan tests are complementary and both can be used to estimate the performance of the forecast models. The likelihood tests, however, do not apply to nonprobabilistic alarmbased models.
At CSEP testing centers, all the ratebased models are evaluated by using likelihood tests. In contrast, alarmbased models are only tested at the California testing center using Molchan tests and the related ROC and ASS tests (Zechar and Jordan 2008; Zechar 2010). Unfortunately, alarm and ratebased models are still tested independently. The main reason for this is that the likelihood tests cannot be applied to an alarmbased model with a nonprobabilistic alarm function. To address this problem, Shebalin et al. (2012) proposed a method to convert alarmbased models to ratebased forecasts. In addition, two ratebased models can be compared by using Molchan diagrams. In practice, the complete ratebased model should be reduced to a single rate value by summing over a given range of magnitude. The reduced model can be treated as a ratebased model and/or as an alarmbased model. Its alarm function is simply composed of the single rate value in each spatial cell.
In summary, all requirements are satisfied for systematic implementation of both likelihood and Molchan tests. The next challenge is to identify the regions of particular skill for each model and combine models in a way that increases the expected event rates.
A differential probability gain approach for combining two earthquake forecast models
Here, we generalize the concept of differential probability gain to combine different types of earthquake forecast models. The main idea is to successively create new generations of a ratebased model by injecting into the current generation the additional information provided by other input models. In what follows, we describe one iteration of the combination procedure using a current model (the initial ratebased forecast model at the first iteration) and an input model (any type of numerical forecast model) to compute a new ratebased forecast model.
The entire procedure is based on the Molchan diagram that evaluates retrospectively the performance of the input model with respect to the current model. For this reason, the input model must be expressed through an alarm function A in each of the considered magnitude ranges. For example, the alarm function of a ratebased model may be simply calculated as a sum of the rates over a magnitude range (Kossobokov 2006).
In a Molchan diagram, we can define the probability gain as
where A_{0} is the threshold value of the alarm function, and the sum symbols refer to the spacetime regions in which the subscript condition is satisfied. This G value is a factor that integrates the increase of the rate of the current model within the spacetime region in which A>A_{0} (Aki 1981; Molchan 1991; Zechar and Jordan 2008). To isolate smaller areas and specific behaviors associated with different ranges of the alarm function, we work with the differential probability gain function that can be defined as the derivative of a continuous Molchan trajectory,
With the small samples we have in practice, the Molchan trajectory is always a steplike function. We smooth this function using a limited number of segments to avoid overfitting the differential probability gain function (see Appendix 1). Finally, for each segment and the corresponding range [ A_{0}; A_{0}+δ A_{0}] of alarm function values, we have a differential probability gain of
where the sum is taken over to the spacetime regions in which the subscript condition is satisfied. Considering all segments, we can assign a specific {g}_{\text{input}}^{\text{current}} value to any A value of the input model. Having done that, we can produce spacetime maps of the differential probability gain of the input model and combine them with the current ratebased model. Then, the next generation of the ratebased model is defined as
that is, the initial rates of the current model increase or decrease according to the local {g}_{\text{input}}^{\text{current}} value. The {g}_{\text{input}}^{\text{current}} values are always estimated retrospectively over long times to be used in future applications.
This method of model combination is similar to convolving the current model with the input model. For this reason, in the following, we denote one iteration of this procedure by
Results and discussion
Combining Every Earthquake a Precursor According to Scale and Early Aftershocks Statistics forecast models in California
The Every Earthquake a Precursor According to Scale (EEPAS) and Early Aftershocks Statistics (EAST) models are forecast models installed at the CSEP Testing Center in California. They are of special interest for this study because, based upon the joint evaluation using Molchan tests outside the official CSEP testing process, it was found that they both yield statistically significant better forecasts than a Relative Intensity (RI) reference model, a timeindependent model that is commonly used as a reference model in Molchan tests (Kossobokov and Shebalin 2003; Helmstetter et al. 2006; Molchan and Keilis–Borok 2008; Zechar and Jordan 2008).
The EEPAS model (Rhoades and Evison 2004; 2007) is a mediumterm forecast model based on the precursory scale increase phenomenon and associated predictive scaling relations (Evison and Rhoades 2004). In this model, every earthquake is a precursor according to scale, with the scale referring to larger earthquakes to follow in the medium to long term. Then, smaller earthquakes are ‘witnesses’ of the seismogenic process and not ‘actors’ as in the wellknown branching model ETAS (Ogata 1989). Several versions of the EEPAS model have been used to generate forecasts of seismicity for the next 3 months. They have been tested at the California CSEP testing center since January 2008. Among the five versions of this forecast model, we chose the EEPAS0F model (henceforth referred to as simply the EEPAS model) because it performs the best against the RI reference model.
The EAST model is an alarmbased earthquake forecast model that uses early aftershock statistics (Shebalin et al. 2011). This model is based on the hypothesis that the time delay before the onset of the powerlaw aftershock decay rate decreases as the level of stress, and the seismogenic potential increase (Narteau et al. 2002; 2005; 2008; Narteau et al. 2009). In contrast to the EEPAS model, the EAST model is not a model of seismicity rates. Instead, it possesses a nonprobabilistic alarm function that is dedicated to detecting places with a higher level of stress where an earthquake is more likely to occur. These zones are identified by relatively low values of the geometric mean of elapsed times between mainshocks and early aftershocks. The threemonth class EAST model has been archived at the California CSEP center since July 2009.
To perform all the likelihood tests for CSEP testing centers, a ratebased version of the EAST model has been developed by Shebalin et al. (2012). Here, this new ratebased model called EAST _{R} can be described as a combination of EAST (i.e., the input model) and the RI timeindependent reference model (i.e., the current model). If the new ratebased model performs similarly to the initial alarmbased model, its forecast skill is likely to depend on the conversion method. This is also true for the combining method developed here, especially since it can be applied to two timedependent models.
In all our tests, we consider two testing regions: the official CSEP California testing region and a subset, with the idea that the subset will reduce the problem of reduced earthquake detectability in the ocean and outside the USA (Shebalin et al. 2011). We also consider a retrospective period from January 1984 to June 2009 and a quasiprospective time period from July 2009 to December 2011. The official CSEP test started on 1 July 1 2009, for the EAST model and earlier for the EEPAS model, and all model parameters for EEPAS, EAST, and {g}_{\text{EAST}}^{\text{EEPAS}} were fixed beforehand. Note that to set up the EAST model and to calculate the functions {g}_{\text{EAST}}^{\text{EEPAS}}, we only use the testing region subset.
Crossevaluation of earthquake forecast models
To underline how different forecast models may provide independent information about seismicity, we perform a crossevaluation of the EAST _{R} and EEPAS models using Molchan diagrams (Figure 1). In both retrospective (Figure 1a) and quasiprospective tests (Figure 1b,c), Molchan trajectories are below the diagonal, indicating that each model provides a gain in prediction with respect to the other (see Subsection ‘Molchan tests’). Although at first glance these results might appear contradictory, we interpret this as an indication that the EAST _{R} and EEPAS models are complementary. Because they focus on different relevant aspects of seismicity, each of them gives an additional amount of predictive information.
This complementary nature of two independent models of seismicity is difficult to detect using likelihood tests (Table 1). However, we stress the point that it may be an important property for earthquake forecasting and certainly the best case to combine two independent models. Then, the method to combine must preserve the knowledge gain that each model offers.Here, we use the differential probability gain method to combine two forecast models. We infer that the increase in expected rates may be locally high, particularly if several models are successively combined. For one iteration, the combination is driven by the slope of the Molchan trajectories. Therefore, it is ideal to have two models that are substantially complementary in their forecasts. As shown by Figure 1, this condition is satisfied by the EAST and EEPAS models, and their combination may yield new predictive information on California seismicity.
A combination of EAST and EEPAS forecast models using differential probability gains
As shown by (Shebalin et al. 2012), there is no significant difference in the predictive power of the EAST and EAST _{R} models. Therefore, to avoid potential noise introduced by the RI reference model or the conversion method, we combine directly the EAST and EEPAS models. To construct the differential probability gain functions {g}_{\text{EAST}}^{\text{EEPAS}}, we consider the retrospective period and three magnitude ranges of [ 4.95;5.45), [ 5.45;5.95), and [ 5.95;∞). Within each of those intervals, the combined model inherits the magnitude distribution of the EEPAS model, which is not constrained to follow the GutenbergRichter relation in each cell. Figure 2 shows for each interval the Molchan diagrams, their approximation by segments, and the differential probability gain functions of the EAST model with respect to the EEPAS model. We observe that the {g}_{\text{EAST}}^{\text{EEPAS}} values are greater than one for almost two orders of magnitude of the alarm function of the EAST forecast model (Figure 2d,e,f).
For three magnitude intervals we use the corresponding functions g EASTEEPAS and the rates λ_{EEPAS} of the EEPAS model in Equation 6 to obtain the rates of the new ratebased model EAST ∗EEPAS. Figure 3 shows Molchan diagrams used to evaluate the forecast of the EAST, EEPAS, and EAST ∗EEPAS models with respect to the RI reference model during the quasiprospective period. The comparison of these Molchan trajectories shows that the combined model works better than the two initial models from which it has been derived. This is particularly the case for the smallest τ_{RI} value, for which both initial models perform better than the RI reference model at a significance level below α=1%. Results of total likelihood and spatial likelihood (Table 1) indicate also a gain of the EAST ∗EEPAS model with respect to the EEPAS model. Quantitatively, the loglikelihood gain is 0.30 and 0.26 per earthquake for total and spatial likelihoods, respectively. With respect to the EAST _{R} model, these gains are 0.34 and 0.43, respectively.
During the quasiprospective period, many of the target earthquakes occurred in the ocean or outside the USA. In those zones, the catalog of M≥1.8 early aftershocks is incomplete and the alarm function of the EAST model is less precisely defined (Shebalin et al. 2011). In the subset of California that we used in the retrospective analysis, the likelihood gain of the EAST ∗EEPAS model with respect to the EEPAS model is more than 0.5 per earthquake. The gain with respect to the EAST _{R} model is smaller, 0.05 for total likelihood and 0.25 for spatial likelihood.
Figure 4 illustrates how our combining method works using the outputs of the EAST, EAST _{R}, EEPAS, and EAST ∗EEPAS models for the forecast period from April to June 2010 along the USAMexico border. This spacetime region includes the M 7.2 El MayorCucapah earthquake of 4 April 2010. Figure 4a shows the map of the EAST alarm function values. For ratebase models (Figure 4b,c,d), the rates are calculated for M≥4.95 events by summation of all model rates in corresponding magnitude bins. All models exhibit a clear maximum near the epicenter of the El MayorCucapah earthquake. However, in an area extending northward from this epicenter, the EAST ∗EEPAS model gives rates that are almost one order of magnitude larger than the rates of the two individual models. This example demonstrates that our combining method sharpens these individual forecasts, providing higher expected earthquake rates in more confined areas. These local increases of the forecast event rates are compensated by decreases in other places so that the total event rate over the whole territory does not significantly change (see Appendix 2).
Comparing combining methods
Using the weighted average of two ratebased models is straightforward and therefore the most common combining method (Rhoades and Gerstenberger 2009; Marzocchi et al. 2012). In addition, the use of weighted averages may increase the total predictive skill of the resulting model by locally giving more importance to high and low extreme rate values of each model. However, it remains an averaging method. Then, if the combined model keeps the total expected earthquake rate (convex combination) unchanged, local rates cannot be higher (or lower) than the maximum (minimum) rates of the two models. One exception is the combination of models that concentrate on forecasting different seismic patterns (for example, socalled mainshocks and aftershocks (Rhoades and Gerstenberger 2009)). In that case, it is not necessary to keep the total rate unchanged, and the combined rates may exceed the maximum of the two models being combined.
Here, we compare the EAST ∗EEPPAS model to the EAST _{R}+EEPAS model, the simple average of the EAST and EEPAS forecast models. Then, using Molchan diagrams, we compare the two combined models to the RI reference model. Figure 5 shows that the forecast of the EAST ∗EEPAS model outperforms the forecast of the EAST _{R}+EEPAS model, especially for the smallest τ_{RI} value, for which both the EAST and EEPAS models perform the best.
The results of the likelihood tests are quite different than those for Molchan diagrams. In fact, if the Molchan trajectory of a linearly combined model is likely to run between the trajectories of the initial models, Table 1 shows that the log likelihood for the linearly combined model is closer to zero than for the models from which it is derived. However, the EAST ∗EEPAS model again exhibits better performance than the EAST _{R}+ EEPAS model. The gain in log likelihood per earthquake is 0.11 for total likelihood and 0.15 for spatial likelihood. In the reduced region, the EAST ∗EEPPAS performs better than the linearly combined model, which now has a score between that of EEPAS and that of EAST _{R}.
Figure 6a shows the cumulative distribution functions of the rates predicted by the EAST ∗EEPAS and EAST _{R}+EEPAS models. This plot indicates that the EAST ∗EEPAS model explores a much wider range of values than the EAST _{R}+EEPAS models. In addition, we observe that more than 50% of target earthquakes occur for only 2% of the highest rates and that, for these events, the rates of the EAST ∗EEPAS model are about twice those of the EAST _{R}+EEPAS model. This difference indicates that the combination based on differential probability gain is currently better than a linear combination at increasing the predicted event rates in the limit of high rate values. Obviously, the opposite is true in the limit of low range value. Nevertheless, in this case, both models fail to predict the target earthquake and a possible gain in trying to combine them is worth discussing. Simultaneously, these results confirm that the only restriction for performing a combination is the need to use two complementary forecast models, i.e., two models that ensure a nontrivial Molchan diagram with respect to one another ({g}_{\text{input}}^{\text{current}}>1 in Equation 6).
Figure 6b,c compares the single rate values of the EAST ∗EEPAS and EAST _{R}+EEPAS models in spacetime regions where a target earthquake occurred. In the limit of high rates, the rate values of the EAST ∗EEPAS model are double those of the EAST _{R}+EEPAS model (Figure 6b). This demonstrates that a combination of forecast models based on differential probability gains can increase earthquake probabilities in ratebased forecast models.
Conclusions
There are different types of earthquake forecast models, all of which are related to specific sets of observations (e.g., catalogs of seismicity, fault maps, strain rates, and seismic precursors). As illustrated by the outputs of alarmbased and ratebased models, individual forecasts may fall into different classes of statistical models. In this specific context, it is still impossible to combine all types of forecast using ensemble classification methods. Hence, we proposed here a practical method to combine forecasts based on differential probability gains.
This method can be described as an operational device to identify and join independent sources of information related to earthquake phenomena. The quality of the combined model does not have to depend on causal relationships between the different models being combined. Actually, the procedure applies to any forecast models that have additional forecast skill. Nevertheless, since we cannot formulate it in terms of traditional classification problems without a loss of generality, the overall performance of a combined forecast model can only be established on purely empirical grounds.In contrast to linear methods, our procedure does not average the local expected rates of different models. Instead, it redistributes in space the rates of the current model according to the additional knowledge carried by the input model. Then, as shown by Figure 6, the combined model may cover a larger range of rate values, especially in the limit of high rates that are critical for operational forecast. An essential property of this redistribution process is to keep constant the total expected rate of the current ratebased model. This property can be easily demonstrated from simple geometric considerations on the smoothed Molchan trajectories (see Appendix 2 and Figure 2). Nevertheless, if this conservation property is verified for the retrospective period during which the differential probability functions have been determined, it may be only approximate during the quasiprospective and realtime tests.
The differential probability gain approach can be used to combine successively different forecast models. However, each model brings not only additional information but also some noise. Therefore, changes in the level of noise must be estimated when many models are combined together. In realtime testing, it is practically impossible to quantify the level of noise in individual models because the available case histories are always quite short. Accordingly, making theoretical estimates of the overall noise is not yet possible. Instead, we perform numerical experiments (see Appendix 3). The test shows that even after 10 iterations with highly noisy simulated models, the result remains quite similar to the original one.
Our combination method is not commutative. To demonstrate this, we analyze the two combined models that can be derived from EAST _{R} and EEPAS models. We note that the two possible combined models are quite similar. Nevertheless, using Molchan and likelihood tests, we observe that the EAST _{R}∗EEPAS model performs better (and is therefore different) than the EEPAS ∗EAST_{R} model. The difference in log likelihood per earthquake is about 0.1.
As shown in Figure 2d,e,f, we obtain nonmonotonic differential probability functions {g}_{\text{EAST}}^{\text{EEPAS}}, that is, the highest alarm function values do not always correspond to the highest {g}_{\text{EAST}}^{\text{EEPAS}} values and high peaks may be observed (Figure 2d). Such cases require special attention. For example, it might be that the peaks are caused by aftershocks of a large earthquake. In our particular case, we checked the time and location of the earthquakes corresponding to the peaks and found only one case of spatial clustering, a swarm of four 4.97≤M≤5.1 events in February 2008 preceded by a M 5.36 event in May 2006 and followed by a M 4.96 event in November 2008. This sequence took place near the epicenter of the future M 7.2 El MayorCucapah earthquake of April 4, 2010. All these M≈5 events are associated with a multiple peak in Figure 2d for A_{EAST}≈2. However, this specific range of alarm function is also associated with 13 other target earthquakes. Hence, we infer that a monotonic character of the differential probability gain function g is not a necessary condition to combine two models. The gain functions depend on the choice of the learning interval. Some stronger smoothing or approximation might be preferable, particularly for operational forecasts.The major difference between Molchan and likelihood tests resides in the way the rate variable is weighted. Molchan tests are based on the sum of the expected event rates, whereas likelihood tests are based on the sum of the logarithm of these rates. The comparison of Figure 6b,c illustrates the difference between these two tests:

1.
In Molchan diagrams, the relative performance of two models may be measured by the probability gain (see Equation 3, Aki (1981), and Molchan (1990)). For two ratebased models, this probability gain may also be estimated as the slope of the bestfit line in a crossdistribution of the rates (i.e., the diagram in which the rates of one model are plotted with respect to the other where target earthquakes have occurred). In Figure 6b, this slope is close to 2. In addition, we can graphically verify that events at high rates have more weight than events at small rates in determining this slope.

2.
In likelihood tests, the relative performance of two models is expressed as the difference of their log likelihoods per earthquake (Equation 2). If the total expected rates are the same in both models, this is equivalent to the average vertical distance to the diagonal in Figure 6c. In contrast to Aki’s probability gain (Figure 6b), this averaging gives the same weight to all rates. As a consequence, positive distances at high rates and negative distances at low rates cancel each other out.
This comparison highlights the advantage of our method based on Molchan tests. Indeed, good forecasts for low earthquake rates are not as important as for high rates. In fact, earthquakes occurring in spacetime regions with low expected rates have to be considered as a ‘failure to forecast’.
The combining method based on differential probability gains shows promising results compared to individual earthquake forecast models and other linear combination techniques. Overall, this procedure opens new opportunities for operational forecasting by substantially increasing the forecast event rates. One can also imagine applying an iterative application of our method to combine several forecast models of different types using the differential probability gain method. At each step, any model that does not provide additional knowledge with respect to the current model would be ignored in this combination. Therefore, the combination will be best if the input models are constructed from different concepts, data, or seismic precursors. A potential important research path would be to extend the scope of the currentgeneration earthquake forecast experiments to move beyond testing various models and begin evaluating different methods of model combination.
Appendix 1
Automatic procedure to estimate differential probability gain functions
Given a finite number N of target earthquakes and the discretization of space, the Molchan trajectory is a steplike function. To estimate the differential probability gain function, we use a procedure that automatically smooths a Molchan trajectory into N_{seg} segments. If N≤N_{seg}, we consider a segment for each step of the Molchan trajectory, and the vertical coordinates of the segments are i/N with i∈{0, 1, …, N}. If N>N_{seg}, we only consider N_{seg} segments, and the vertical coordinates of the segments are ⌊N(N_{seg}−i)/N_{seg}⌋/N with i∈ [ 0,N] and where ⌊x⌋ is the largest integer less than or equal to x. At each step where there is a vertical limit of segments, the horizontal coordinate of the segments is the τ value that corresponds to the median of the distribution of the alarm function value for this step. Everywhere in this study, we use N_{seg}=20 (Figure 2), but we check the stability of the results for 10<N_{seg}<30. From our experience, we also infer that there is not a strong impact of the smoothing method on the result as soon as N_{seg}>10.
Appendix 2
Conservation of the total expected rate in a combining method using differential probability gains
g(A) is the differential probability gain function of the Molchan diagram that evaluates the performance on an input model of alarm function A with respect to a ratebased forecast model. λ_{current}(x,t) are the expected event rates of this current version of the ratebased model in the spacetime region (x,t). Using our combining method, we calculate the expected event rates of the new ratebased model
To estimate the differential probability gain function, we smooth the Molchan trajectory using N_{seg} segments (see Appendix 1). Let us denote by (τ_{ i }, ν_{ i }) and A_{ i }, i∈{0, 1, …, N_{seg}}, the segment extremity coordinates and the corresponding alarm function values of the input model, respectively. The slope of these segments is the differential probability gain g_{ i }, i∈{1, 2, …,N_{seg}}. By definition, we have
For the current and the new model, we may group the expected event rates according to the ranges of the input model alarm function that correspond to the different segments of the Molchan diagram (Figure 2 and Appendix 1):
where the sum symbols refer to the spacetime regions in which the subscript condition is satisfied. We define the total rates of the current and the new models as
For the period in which the differential probability gain function is estimated, we have
In that case, using successively Equations 7, 11, and 8, we obtain
For a realtime test or a quasiprospective test, this conservation of the total expected rate is approximate.
Appendix 3
The level of noise in combined forecast models
Taking the EAST ∗EEPAS model as the initial forecast model, we study the impact of highly noisy alarmbased models on the nextgeneration forecasts (Figure 3a). As before, we used the period from January 1984 to June 2009 for learning and the period from July 2009 to December 2011 for testing. For both periods, a single magnitude interval for target events (M≥4.95), and each spacetime region, we simulate 10 alarmbased models by drawing random numbers from a uniform distribution between 0 and 1. Then, we iteratively create the nextgeneration forecasts by incorporating the predictive skills of individual models into the currentgeneration forecast. As described in Section ‘A differential probability gain approach for combining two earthquake forecast models’ and Appendix 1, the g(A(x,t)) values are estimated during the learning period (Figure 7a) and then injected into the currentgeneration forecasts for both periods. For the learning period, Figure 7b shows the Molchan diagram that compares the forecasts before and after the 7th iteration. This iteration exhibits the largest deviation from the diagonal. For the testing period, Figure 7c shows the Molchan diagram that evaluates the predictive skills of the initial and final forecast models with respect to the RI reference model. We find that even after 10 iterations with highly noisy simulated models, the result remains quite similar to the original one. This result is consistent with all the successive Molchan diagrams, which systematically show that two consecutive forecasts have no specific skills with respect to one another. This analysis reveals that, despite an unavoidable increase of noise, there is no systematic erosion of the forecast skills of a model during the combining procedure. Therefore, an obvious recommendation is to avoid combining weak models.
References
Aki K: A probabilistic synthesis of precursory phenomena. In Earthquake prediction. An international Review Manrice Ewing series. Edited by: Simpson DV, Richards PG. American Geophysical Union, Washington, DC; 1981:566–574.
Aki K: Scale dependence in earthquake phenomena and its relevance to earthquake prediction. Proc Natl Acad Sci USA 1996, 93: 3740–3747. 10.1073/pnas.93.9.3740
Eberhard DA, Zechar JD, Wiemer S: A prospective earthquake forecast experiment in the western Pacific. Geophys J Int 2012, 190(3):1579–1592. 10.1111/j.1365246X.2012.05548.x
Evison FF, Rhoades DA: Demarcation and scaling of longterm seismogenesis. Pure Appl Geophys 2004, 161: 21–45. doi:10.1007/s00024–003–2435–8
Gerstenberger MC, Jones LM, Wiemer S: Short–term aftershock probabilities: case studies in California. Seism Res Lett 2007, 78: 66–77. 10.1785/gssrl.78.1.66
Hastie T, Tibshirani R, Friedman J: The elements of statistical learning, 2nd edn. Springer (Springer Series in Statistics), New York, USA; 2008.
Helmstetter A, Kagan YY, Jackson DD: Comparison of shortterm and timeindependent earthquake forecast models for southern California. Bull Seismol Soc Am 2006, 96: 90–106. 10.1785/0120050067
Jordan TH: Earthquake predictability, brick by brick. Seismol Res Lett 2006, 77: 3–6. 10.1785/gssrl.77.1.3
Jordan TH, Jones LM: Operational earthquake forecasting: some thoughts on why and how. Seism Res Lett 2010, 81: 571–574. 10.1785/gssrl.81.4.571
KeilisBorok V, Kossobokov V: Premonitory activation of earthquake flow  algorithm M8. Phys Earth Planet Inter 1990, 61: 73–83. 10.1016/00319201(90)90096G
KeilisBorok V, Rotwain I: Diagnosis of time of increased probability of strong earthquakes in different regions of the world  algorithm CN. Phys Earth Planet Inter 1990, 61: 57–72. 10.1016/00319201(90)90095F
Kossobokov V: Testing earthquake prediction methods: the west Pacific shortterm forecast of earthquakes with magnitude M_{ w } ≥ 5.8. Tectonophysics 2006, 413: 25–31. 10.1016/j.tecto.2005.10.006
Kossobokov V, Shebalin P: Earthquake prediction. In Nonlinear dynamics of the lithosphere and earthquake prediction. Edited by: KeilisBorok VI, Soloviev AA. Springer, Berlin–Heidelberg; 2003:141–205.
Marzocchi W, Zechar JD, Jordan TH: Bayesian forecast evaluation and ensemble earthquake forecasting. Bull Seismol Soc Am 2012, 102: 2574–2584. doi:10.1785/0120110327
Molchan G: Strategies in strong earthquake prediction. Phys Earth Planet Inter 1990, 61: 84–98. 10.1016/00319201(90)90097H
Molchan G: Structure of optimal strategies in earthquake prediction. Tectonophysics 1991, 193: 267–276. 10.1016/00401951(91)90336Q
Molchan G: Spacetime earthquake prediction: the error diagrams. Pure Appl Geophys 2010, 167(8–9):907–917. doi:10.1007/s00024–010–0087z
Molchan G: On the testing of seismicity models. Acta Geophysica 2012, 60(3):624–637. doi:10.2478/s11600–011–0042–0
Molchan G, Keilis–Borok V: Earthquake prediction: probabilistic aspect. Geophys J Int 2008, 173: 1012–1017. 10.1111/j.1365246X.2008.03785.x
Nanjo KZ, Tsuruoka H, Hirata N, Jordan TH: Overview of the first earthquake forecast testing experiment in Japan. Earth Planets Space 2011, 63(3):159–169. 10.5047/eps.2010.10.003
Narteau C, Shebalin P, Holschneider M: Temporal limits of the power law aftershock decay rate. J Geophys Res 2002., 107: doi:10.1029/2002JB001868
Narteau C, Shebalin P, Holschneider M: Onset of power law aftershock decay rates in Southern California. Geophys Res Lett 2005., 32: doi:10.1029/2005GL023951
Narteau C, Shebalin P, Holschneider M: Loading rates in California inferred from aftershocks. Nonlin Proc Geophys 2008, 15: 245–263. 10.5194/npg152452008
Narteau C, Byrdina S, Shebalin P, Schorlemmer D: Common dependence on stress for the two fundamental laws of statistical seismology. Nature 2009, 462: 642–645. doi:10.1038/nature08553
Ogata Y: Statistical models for standard seismicity and detection of anomalies by residual analysis. Tectonophysics 1989, 169: 159–174. 10.1016/00401951(89)901911
Peresan A, Costa G, Panza G: Seismotectonic model and CN earthquake prediction in Italy. Pure Appl Geophys 1999, 154: 281–306. 10.1007/s000240050230
Rhoades DA: Mixture models for improved earthquake forecasting with shorttomedium time horizons. Bull Seimol Soc Am 2013, 103: 2203–2215. doi:10.1785/0120120233
Rhoades DA, Evison FF: Longrange earthquake forecasting with every earthquake a precursor according to scale. Pure Appl Geophys 2004, 161: 47–72. doi:10.1007/s00024–003–2434–9
Rhoades DA, Evison FF: Application of the EEPAS model to forecasting earthquakes of moderate magnitude in southern California. Seismol Res Lett 2007, 78: 110–115. doi:10.1785/gssrl.78.1.110
Rhoades DA, Gerstenberger MC: Mixture models for improved shortterm earthquake forecasting. Bull Seism Soc Am 2009, 99: 636–646. doi:10.1785/0120080063
Romashkova LL, Kossobokov VG: Intermediate–term earthquake prediction based on spatially stable clusters of alarms. Dokl Earth Sci 2004, 398: 947–949.
Schorlemmer D, Gerstenberger M, Wiemer S, Jackson DD, Rhoades DA: Earthquake likelihood model testing. Seismol Res Lett 2007, 78: 17–29. 10.1785/gssrl.78.1.17
Shebalin P, KellisBorok V, Gabrielov A, Zaliapin I, Turcotte D: Shortterm earthquake prediction by reverse analysis of lithosphere dynamics. Tectonophysics 2006, 413: 63–75. 10.1016/j.tecto.2005.10.033
Shebalin P, Narteau C, Holschneider M, Schorlemmer D: Shortterm earthquake forecasting using early aftershock statistics. Bull Seimol Soc Am 2011, 101: 297–312. doi:10.1785/0120100119
Shebalin P, Narteau C, Holschneider M: From alarmbased to ratebased earthquake forecast models. Bull Seimol Soc Am 2012., 102: doi:10.1785/0120110126
Sobolev GA, Chelidze TL, Zavyalov AD, Slavina LB, Nikoladze VE: Maps of expected earthquakes based on a combination of parameters. Tectonophysics 1991, 193: 255–265. doi:10.1016/0040–1951(91)90335P
Sobolev GA, Tyupkin YS, Smirnov VB: Method of intermediate term earthquake prediction. Doklady Akademii Nauk 1996, 347: 405–407.
Taroni M, Zechar J, Marzocchi W: Assessing annual global M6+ seismicity forecasts. Geophys J Int 2014, 196(1):422–431. 10.1093/gji/ggt369
Tsuruoka H, Hirata N, Schorlemmer D, Euchner F, Nanjo KZ, Jordan TH: CSEP testing center and the first results of the earthquake forecast testing experiment in Japan. Earth Planets Space 2012, 64(8):661–671. 10.5047/eps.2012.06.007
Zechar JD: Evaluating earthquake predictions and earthquake forecasts: a guide for students and new researchers. Community Online Resource for Statistical Seismicity Analysis 2010, pp 1–26. doi:10.5078/corssa77337879
Zechar JD, Jordan TH: Testing alarmbased earthquake predictions. Geophys J Int 2008, 172: 715–724. doi:10.1111/j.1365–246X.2007.03676.x
Zechar JD, Jordan TH: The area skill score statistic for evaluating earthquake predictability experiments. Pure Appl Geophys 2010, 167: 893–906. 10.1007/s0002401000860
Zechar JD, Gerstenberger MC, Rhoades DA: Likelihoodbased tests for evaluating spacerate–magnitude earthquake forecasts. Bull Seism Soc Am 2010a, 100(3):1184–1195. doi:10.1785/0120090192
Zechar JD, Schorlemmer D, Liukis M, Yu J, Euchner F, Maechling PJ, Jordan TH: The collaboratory for the study of earthquake predictability perspective on computational earthquake science. Concurrency Comput–Pract Exp 2010b, 22: 1836–1847.
Acknowledgements
We are grateful to the editor and the reviewers for their valuable comments. This work was partially supported by the Russian Foundation for Basic Researches, Project No. 140500541. The research was partially performed within the framework of the REAKT Project (Strategies and tools for RealTime EArthquake RisK ReducTion) founded by the European Community via the Seventh Framework Program for Research (FP7), Contract No. 282862.
Author information
Affiliations
Corresponding author
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
PS conceived, designed, and coordinated the study, conducted the numerical tests, and wrote the manuscript. CN designed the study and wrote the manuscript. JDZ designed the statistical tests and wrote the manuscript. MH designed the statistical tests. All authors read and approved the final manuscript.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Shebalin, P.N., Narteau, C., Zechar, J.D. et al. Combining earthquake forecasts using differential probability gains. Earth Planet Sp 66, 37 (2014). https://doi.org/10.1186/188059816637
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/188059816637
Keywords
 Probabilistic forecasting
 Earthquake interaction
 Forecasting and prediction
 Statistical seismology