- Full paper
- Open Access

# Some reasoning on the RELM-CSEP likelihood-based tests

- Anna Maria Lombardi
^{1}Email author

**66**:4

https://doi.org/10.1186/1880-5981-66-4

© Lombardi; licensee Springer. 2014

**Received:**15 October 2013**Accepted:**14 April 2014**Published:**1 May 2014

## Abstract

The null hypothesis is the essence of any statistical test: this is basically a comparison of what we observe with what we would expect to see if the null hypothesis was true. In this work, I explore the suitability of the null hypothesis of likelihood-based tests (LBTs), which are often adopted by the laboratories of the Collaboratory for the Study of Earthquake Predictability (CSEP), to check earthquake forecast models. First, I discuss the LBT in the wider context of classical statistical hypothesis testing. Then, I present some cases in which the null hypothesis of LBT is not appropriate for determining the merits of earthquake forecast models. I justify these results from a theoretical point of view, within the framework of point process theory. Finally, I propose a possible upgrade of LBT to enable the correct assessment of the forecasting capability of earthquake models. This study may provide new insights to the CSEP LBT.

## Keywords

- Statistical tests
- Earthquake forecast
- Point processes

## Background

The increasing interest of the seismological community in earthquake forecasting has highlighted the need for a proper evaluation of forecast models. This has motivated the birth of the working group on Regional Earthquake Likelihood Models (RELM, Schorlemmer and Gerstenberger 2007) and of the Collaboratory for the Study of Earthquake Predictability (CSEP, Jordan 2006), both designed to evaluate the quality of forecast models. The protocol adopted by RELM/CSEP is based on classical statistical hypothesis testing (Schorlemmer et al. 2007). This is then finalized to reject or accept the null hypothesis (hereinafter *H*_{0}) on the basis of a numerical summary of the data. RELM/CSEP working groups adopt two main types of testing methods: likelihood-based tests (LBTs) (Schorlemmer et al. 2007; Zechar et al. 2010) and alarm-based tests (ABTs) (Zechar and Jordan 2008). In this study, I focus on LBTs and specifically on *N* and *L* tests (Schorlemmer et al. 2007).

The RELM/CSEP working groups formalized the LBT to test hypotheses that ‘should follow directly the model, so that if the model is valid, the hypothesis should be consistent with data used in a test. Otherwise, the hypothesis, and the model on which it was constructed, can be rejected’ (Schorlemmer et al. 2007). Actually, as I discuss below, this intent was not attained (Lombardi and Marzocchi 2010a; Schorlemmer et al. 2007, 2010a; Werner et al. 2010).

*N*and

*L*tests to check the consistency of expected (

*Λ*={

*λ*

_{(i,j)}}) and observed (

*Ω*={

*ω*

_{(i,j)}}) values of variables

*X*

_{(i,j)}, representing the number of earthquakes with magnitude above a threshold

*M*

_{ F }, in nonoverlapping bins $\left\{\right({T}_{i},{R}_{j});{T}_{i}\in \mathcal{T},{R}_{j}\in \mathcal{R}\}$ of a predetermined spatio-temporal space $\mathcal{S}=\mathcal{R}$ ×$\mathcal{T}$ (Jordan 2006; Zechar et al. 2010). A model is represented by forecasts

*Λ*, which are the only values provided by the modelers. The correct calculation of the

*p*values of the LBT requires the probability distribution of

*X*

_{(i,j)}given by the model and specifically the probabilities

*H*

_{0}, that the variables

*X*

_{(i,j)}are independent and follow a Poisson distribution with mean

*λ*

_{(i,j)}. Therefore, the set of probabilities ${p}_{n}^{\mathit{\text{ij}}}$ are substituted for the probabilities

and the *p* values of the LBT are computed accordingly (Schorlemmer et al. 2007).

*N*test measures the probability of observing ${N}_{i}^{O}=\sum _{j}{\omega}_{(i,j)}$ events, for each forecast time period

*T*

_{ i }. The

*p*values of the

*N*test are given by the probabilities (Zechar et al. 2010):

where ${X}_{i}\phantom{\rule{0.3em}{0ex}}=\phantom{\rule{0.3em}{0ex}}\sum _{j}{X}_{(i,j)}$. The RELM/CSEP protocol rejects a model if *δ*_{1} or *δ*_{2} is too small, meaning that the model overpredicts or underpredicts the observed seismicity. Under *H*_{0}, *X*_{
i
} is a Poisson variable with expectation ${N}_{i}^{F}=\sum _{j}{\lambda}_{(i,j)}$ (and PDF ${q}_{n}^{i}={\left[{N}_{i}^{F}\right]}^{n}{e}^{-{N}_{i}^{F}}/n!$), and the percentiles *δ*_{1}/ *δ*_{2} are computed by this distribution (see Schorlemmer et al. 2007).

*L*-test measures the probability of the joint log-likelihood

*L*(

*Ω*

_{ i }|

*Λ*) of observing

*Ω*, given the forecast

*Λ*. Under

*H*

_{0},

*L*(

*Ω*

_{ i }|

*Λ*) is given by:

*p*value of the

*L*test is estimated by comparing

*L*(

*Ω*

_{ i }|

*Λ*) with a predetermined number

*N*of synthetic likelihood values $L\left({\Omega}_{i}^{S}\right|\Lambda )=\{L\left({\Omega}_{i}^{{S}_{l}}\right|\Lambda ),l=1,\dots ,N\}$, computed by Equation 4, of simulated catalogs ‘consistent with the forecast’ (Schorlemmer et al. 2007). This means that the forecast grids ${\Omega}_{i}^{{S}_{l}}$ are simulated according to the Poisson hypothesis supposed by

*H*

_{0}, and the

*p*value of the

*L*test is given by the proportion of simulated log-likelihoods below the value

*L*(

*Ω*

_{ i }|

*Λ*):

This shows that the LBT does not check the hypothesis that a forecast model has merit with the given data (marked hereinafter by Hyp_{1}). Actually, the LBT tests whether {*ω*_{(i,j)}} are independent random variables, from a Poisson population with mean {*λ*_{(i,j)}} (marked hereinafter by Hyp_{2}). When a model is not consistent with Hyp_{2}, i.e., when the set of probabilities $\left\{{p}_{n}^{\mathit{\text{ij}}}\right\}$ is significantly different from $\left\{{q}_{n}^{\mathit{\text{ij}}}\right\}$, the specific computation of the *p* values of the LBT is misleading, causing a potentially unjustified rejection of the model itself (Lombardi and Marzocchi 2010a).

The CSEP laboratories still systematically use the LBT, but a process of revision has begun. This study is intended to provide a contribution to this process.

## Methods

A suitable revision of the LBT requires the full recognition and quantification of the causes and effects of the present inefficiencies. For this purpose, I apply the *N* and *L* tests to two classes of 1,000 simulated forecast grids, generated by different spatio-temporal magnitude models. In this way, the data are perfectly known, and the rejection of *H*_{0} cannot mean the failure of the model being tested.

First, I generate two sets of synthetic catalogs. Each catalog covers a time period of 1 month (January 1 to 31, 2012), the Italian collecting region, and a magnitude range of [ 2.5,9.0], as chosen by CSEP (Schorlemmer et al. 2010b).

*t*, with location (

*x*,

*y*) and magnitude

*m*, is given by:

where {*μ*,*K*,*c*,*p*,*α*,*d*,*q*,*γ*,*b*} are the model parameters, *M*_{0} and *M*_{max} are the minimum and maximum magnitudes, ${\mathcal{\mathscr{H}}}_{t}=\left\{\right({T}_{i},{X}_{i},{Y}_{i},{M}_{i});{T}_{i}<t\}$ is the history (i.e., the information relative to past events) up to time *t*, and *r*_{
i
} is the distance between location (*x*,*y*) and the epicenter of the *i* th event (*X*_{
i
},*Y*_{
i
}) (see Lombardi and Marzocchi 2010b, for details). To compute the rate ${\lambda}_{1}(t,x,y,m/{\mathcal{\mathscr{H}}}_{t})$, I include in the history the seismic bulletin of the Istituto Nazionale di Geofisica e Vulcanologia (INGV) from April 16, 2005 to December 31, 2011. Moreover, I add a synthetic event (*T*_{ms},*X*_{ms},*Y*_{ms},*M*_{ms}) at time 00:00:00 on January 1, 2012 (*T*_{ms}), with magnitude *M*_{ms} = 6.0 and coordinates (*X*_{ms},*Y*_{ms})=(13.384°*E*,42.346°*N*). The parameter values used in this study are *μ* = 0.7, *K* = 0.026, *p* = 1.15, *c* = 0.01, *α* = 1.4, *d* = 0.7, *q* = 1.5, *γ* = 0.3, *b* = 1.0, *M*_{0} = 2.5, and *M*_{max} = 9.0.

To generate the ETAS forecasts for day *T*_{
i
} and catalog *C*_{
k
}, I mimic the CSEP real-time experiment: specifically, I include the triggering rate for events with history ${\mathcal{\mathscr{H}}}_{{T}_{i}}$ of *C*_{
k
} and average the triggering rates of 1,000 simulated realizations of the process inside *T*_{
i
} (see Lombardi and Marzocchi 2010b, for details).

*λ*

_{2}(

*t*,

*x*,

*y*,

*m*) is given by a stationary background and the triggering effect of event (

*T*

_{ms},

*X*

_{ms},

*Y*

_{ms},

*M*

_{ms}). The rate of the NP model is as follows:

where *r* is the distance between (*x*,*y*) and (*X*_{ms},*Y*_{ms}). The parameters used here are *μ* = 0.7, *K* = 0.1, *p* = 0.9, *c* = 0.02, *α* = 1.4, *d* = 0.7, *q* = 1.5, *γ* = 0.3, *b* = 1.0, *M*_{0} = 2.5, *M*_{max} = 9.0.

The simulations represent the average seismicity of the first month of a sequence (following a shock with magnitude 6.0), as predicted by the ETAS and NP models. The basic difference between the models is that the rate of the ETAS model depends on the whole history ${\mathcal{\mathscr{H}}}_{t}$ (i.e., information relative to past events), whereas the rate of the NP model depends on the coordinates of only one event (*T*_{ms},*X*_{ms},*Y*_{ms},*M*_{ms}). Thus, the rate of the NP model is deterministic and decreasing in time from *T*_{ms}, whereas the rate of the ETAS model has a random nonmonotonic time evolution, depending on history ${\mathcal{\mathscr{H}}}_{t}$.

For each synthetic catalog, I compute the 1-day binned forecast grids *Λ* (*M*_{
F
} = 2.5) by integrating (in time, space, and magnitude) the rate of the model used to generate the catalog. The forecast grids *Λ* cover a period of 1 month (starting from January 1, 2012) and the test spatial grid adopted for the CSEP Italian laboratory (Schorlemmer et al. 2010b). Finally, I apply the CSEP/RELM *N* and *L* tests (with significance level *α* = 0.05 and *M*_{
F
} = 2.5) on all simulated catalogs, using the forecast grids previously computed.

*L*(

*Ω*

_{ i }|

*Λ*) of variables

*X*

_{ i }(Equation 4) is substituted for the continuous-time log-likelihood function (hereinafter, CLF). This is a proper measure of the agreement between model and data, taking into account the features of a model. For a spatio-temporal magnitude earthquake model, this is given by

where *λ*(*t*,*x*,*y*,*m*) is the rate of the model (Daley and Vere-Jones 2003) and ${N}_{\mathcal{Rx}\mathcal{Tx}\left[{M}_{0}{M}_{\text{max}}\right]}$ is the number of events inside the spatio-temporal magnitude space $\mathcal{Rx}\mathcal{Tx}\left[{M}_{0}{M}_{\text{max}}\right]$.

Second, the percentiles of the distributions of both the variables *X*_{
i
} and the CLF are derived directly by the model. This information allows the computation of more reliable *p* values for the tests (Werner and Sornette 2008; Schorlemmer et al. 2010a).

- 1.
For each forecast period

*T*_{ i }, the number of events (*Ω*_{ i }) and the CLF (CLF_{M,i}) of model*M*being tested are computed. - 2.
For each

*T*_{ i },*N*catalogs given by model*M*are simulated; the occurrences ${\Omega}_{M,i}^{S}=\left\{{\Omega}_{M,i}^{{S}_{l}},l=1,\dots ,N\right\}$ and the likelihood ${\text{CLF}}_{M,i}^{S}=\left\{{\text{CLF}}_{M,i}^{{S}_{l}},l=1,\dots ,N\right\}$ are computed for all catalogs. - 3.
The percentiles of the empirical distributions generated in the previous step, used to perform a test at the 95

*%*confidence level, are estimated. Specifically, the 2.5th and 97.5th percentiles $\left({P}_{M,i}^{\Omega}\left[\phantom{\rule{0.3em}{0ex}}2.5\%\right]\text{and}\phantom{\rule{1em}{0ex}}{P}_{M,i}^{\Omega}\left[\phantom{\rule{0.3em}{0ex}}97.5\%\right]\right)$ of values ${\Omega}_{i}^{S}$ and the 5th percentile $\left({P}_{M,i}^{\text{CLF}}\left[\phantom{\rule{0.3em}{0ex}}5\%\right]\right)$ of quantities ${\text{CLF}}_{M,i}^{S}$ are identified. - 4)
The observed values

*Ω*_{ i }and CLF_{M,i}are compared with the percentiles computed in the previous step. In this way, model*M*is rejected or retained for*T*_{ i }. Specifically, model*M*is rejected if ${\Omega}_{i}<{P}_{M,i}^{\Omega}\left[\phantom{\rule{0.3em}{0ex}}2.5\%\right]$ or ${\Omega}_{i}>{P}_{M,i}^{\Omega}\left[\phantom{\rule{0.3em}{0ex}}97.5\%\right]$ or if ${\text{CLF}}_{M,i}\le {P}_{M,i}^{\text{CLF}}\left[\phantom{\rule{0.3em}{0ex}}5\%\right]$.

In this procedure, the percentiles of model *M* are estimated by simulations because it is often not possible to derive them analytically. However, the use of simulations is not mandatory for modelers, of course.

## Results

*F*

_{ R }(i.e., the proportion of catalogs for which

*H*

_{0}is rejected) of the

*N*and

*L*tests as a function of time. As shown in Lombardi and Marzocchi (2010a),

*F*

_{ R }for the ETAS simulations is well above 5

*%*, which is the threshold justifiable by chance. On the other hand,

*F*

_{ R }for the NP simulations is close to or below 5

*%*, suggesting that Hyp

_{2}is consistent with the NP model.

To investigate whether previous results depend on *M*_{
F
} or on the average seismic rate of the region, I apply the procedure described above to 1,000 new catalogs, reproducing the average seismicity of Japan (which has a seismic rate two orders of magnitude higher than that of Italy). These datasets are simulated by using an *ad hoc* ETAS model of this region. In this experiment, I consider a forecast time span *T*_{
i
} of 3 months, an overall time period of 10 years, and *M*_{
F
}=4.0. This last value is the threshold magnitude adopted by the Japanese CSEP laboratory for short-term forecasting experiments (Nanjo et al. 2011; Tsuruoka et al. 2012). I find that *F*_{
R
} is equal to 40*%* *t* *o* 50*%* and 60*%* *t* *o* 75*%* for the *N* and *L* tests, respectively (see Figure 1b).

*F*

_{ R }in Figure 1b. The improvement, with respect to the CSEP version of the

*N*and

*L*tests, is clear:

*F*

_{ R }is close to or below 0.05 for both tests. To clearly compare the CSEP methodology and the new testing procedures, Figure 2 shows the PDF of occurrences and log-likelihoods computed by the CSEP LBT and the proposed procedure for the first ETAS simulated Japanese catalog. The observed occurrences (solid black line, Figures 2a,b) are well above or below the confidence bounds (dashed black lines, Figure 2a) of the Poisson PDF (Equation 1) supposed by Hyp

_{2}. This is because the distribution expected by the ETAS model (contour plot, Figure 2b), estimated by the empirical PDF of ${\Omega}_{\text{ETAS},i}^{S}$, has a long/heavy tail, which is clearly not consistent with Hyp

_{2}. Similar results are found for the log-likelihood. The log-likelihoods

*L*(

*Ω*

_{ i }|

*Λ*) computed by Equation 4 are well below the values of $L\left({\Omega}_{i}^{S}\right|\Lambda )$ expected by Hyp

_{2}(contour plot, Figure 2c). However, the log-likelihoods CLF

_{ETAS,i}(Equation 8) are fully consistent with the log-likelihoods ${\text{CLF}}_{\text{ETAS},i}^{S}$ expected by the ETAS model (contour plot, Figure 2d).

## Discussion

The rejection of the null hypothesis of a statistical test can be due to chance because it is really false or because it is probabilistically inadequate (Stark 1997; Luen and Stark 2008). The null hypothesis *H*_{0} of the RELM/CSEP LBT supposes that *X*_{(i,j)} are independent (in time and space) and Poisson random variables, with mean *λ*_{(i,j)}, given by the model. The CSEP protocol interprets the rejection of *H*_{0} as the failure of the model being tested. However, this procedure is misleading because *H*_{0} is not consistent with any model (Lombardi and Marzocchi 2010a).

The above findings may be explained with the help of stochastic point process theory (Daley and Vere-Jones 2003); this is the natural context in which stochastic earthquake models may be discussed. A point process is fully represented by its ‘conditional intensity function’ (CIF) $\lambda (t,\overrightarrow{x}/{\mathcal{\mathscr{H}}}_{t})$, i.e., the probability of observing an event in the instant $t\in \mathcal{T}$ and with additional variables (called marks) $\overrightarrow{x}\in \overrightarrow{\mathcal{X}}$, given the realization ${\mathcal{\mathscr{H}}}_{t}$ of the process before *t* (Daley and Vere-Jones 2003). The CIF of the models described in the previous section are given by Equations 6 and 7; the marks are locations and magnitudes. In the case of an NP process, the CIF is a deterministic function of time and marks, but it is independent of the past history (i.e., $\lambda (t,\overrightarrow{x}/{\mathcal{\mathscr{H}}}_{t})=\lambda (t,\overrightarrow{x})$). Therefore, the events in nonoverlapping subsets of $\mathcal{T}\times \overrightarrow{\mathcal{X}}$ are independent and Poisson random variables (Daley and Vere-Jones 2003), as supposed by the RELM/CSEP LBT. In the most general case, the CIF is also a function of history ${\mathcal{\mathscr{H}}}_{t}$, and the variables *X*_{(i,j)} are not Poisson, unless the history is fully known (Meyer 1971; Papangelou 1972a; 1972b; Daley and Vere-Jones 2003). In a real-time forecast experiment, the history inside the forecast time window *T*_{
i
} is unknown; therefore, for such history-dependent models, such as ETAS, Hyp_{2} is inadequate.

The hypothesis Hyp_{2} has been questioned in several studies (Schorlemmer et al. 2010a; Werner et al. 2010) and, in the specific context of ETAS modeling, by Lombardi and Marzocchi (2010a). Here, I examine the causes and effects of the failure of the LBT. Specifically, I show that the failure of the LBT may be significant for high values of *M*_{
F
} and that it has heavy consequences for long forecast time windows. This is because the longer the forecast time window *T*_{
i
}, the greater the randomness of forecasts (due to the effect of the unknown history inside *T*_{
i
}) and the lower the reliability of Hyp_{2}. This result contradicts the statement that the Poisson distribution is a good approximation of the forecast variability when *M*_{
F
} is large (Werner et al. 2010).

The process of revising the LBT has begun inside the scientific community. Some people have proposed replacing the Poisson distribution with a negative binomial distribution (Werner et al. 2010) to compute the *p* values of the tests. However, this solution does not significantly improve the LBT because the negative binomial distribution (as for the Poisson or any other distribution) is not consistent with all models. Inside the CSEP community, some suggest updating the forecasts more regularly, leaving the LBT unchanged (personal communication). I do not think this is the best way to resolve the inefficiencies of the LBT, as these do not derive from the regularity of the forecast calculations.

The procedure described above is an obvious upgrade of the *N* and *L* tests. It accounts for the actual variability of the *X*_{(i,j)} given by the model being tested. Moreover, it uses the CLF, which is a better tool for checking the agreement between models and data than the discrete log-likelihood (Equation 4) used by the CSEP *L* test and based on Hyp_{2} (Schorlemmer et al. 2007).

This study has focused on short-term forecasts, without analyzing the dependence of results on the size of the forecast window. From a theoretical point of view, LBT might also fail for long-term forecasts because of dissimilarities between the sets of probabilities $\left\{{p}_{n}^{\mathit{\text{ij}}}\right\}$ and $\left\{{q}_{n}^{\mathit{\text{ij}}}\right\}$ (see Equations 1 and 2) or, in other words, the unsuitability of Hyp_{2}. This study is not relevant to models that are explicitly supposed to be time-invariant, such as the models tested in the 5-year mainshock RELM experiment (Schorlemmer et al. 2010a; Zechar et al. 2013). However, the failure of the LBT might be significant for medium long-term forecast models with strong time-dependent components, especially in testing regions with a high seismic rate. In other words, the present study does not invalidate most of the results of the first RELM/CSEP forecast experiments, which focus on long-term time-invariant models. However, the inclusion of different forecast time-spans and time-dependent models in new CSEP experiments requires both an urgent revision of the testing procedure and an effort by modelers to provide full distributions of the variables being tested.

## Conclusions

*N*and

*L*tests. The main findings can be summarized as follows:

- 1.
All LBTs are based on classical statistical hypothesis testing; therefore, they are intended to reject or not reject a null hypothesis

*H*_{0}. The null hypothesis of the LBT is that the variables*X*_{(i,j)}are independent and Poisson-distributed, with the rate given by forecasts. Therefore, the LBT is inadequate for checking the merits of a forecast model that is inconsistent with Hyp_{2}. - 2.
Specifically, Hyp

_{2}is not adequate for history-dependent models, such as ETAS, because the unknown history inside the forecast period means that*X*_{(i,j)}do not follow a Poisson distribution. - 3.
In these cases, the LBT may fail for large values of

*M*_{ F }, especially for large forecast time windows, as the effect of the unknown history is greater. - 4.
I propose a revised version of the LBT that (1) adopts the CLF and (2) requires the percentiles of the distributions of

*X*_{ i }and CLF_{M,i}. - 5.
The points discussed in this study highlight the need to revise the testing procedure for present and future experiments, which include many time-dependent models. However, they have a relative effect on the first RELM/CSEP experiments, mainly focused on long-term time-independent models.

## Declarations

### Acknowledgements

The author is grateful to W. Marzocchi (INGV) for stimulating discussions on the topics presented in this paper. The suggestions made by D.D. Jackson (UCLA) and two anonymous referees have significantly improved the quality of the paper. The Italy earthquake data were obtained from the seismic bulletin of the Istituto Nazionale di Geofisica e Vulcanologia (INGV, http://iside.rm.ingv.it). The Japan earthquake data were extracted by the Earthquake Catalog of the Japan Meteorological Agency (JMA, http://www.jma.go.jp/en/quake). Information on CSEP is available at http://www.cseptesting.org.

## Authors’ Affiliations

## References

- Daley DJ, Vere-Jones D:
*An introduction to the theory of point processes*. Springer, New York, pp. 469; 2003.Google Scholar - Jordan TH: Earthquake predictability: Brick by brick.
*Seism Res, Lett*2006, 77(1):3–6. 10.1785/gssrl.77.1.3View ArticleGoogle Scholar - Lombardi AM, Marzocchi W: Exploring the performances and usability of the CSEP suite of tests.
*Bull Seismol Soc Am*2010a, 100: 2293–2300. 10.1785/0120100012View ArticleGoogle Scholar - Marzocchi W, Lombardi, AM: The ETAS model for daily forecasting of Italian seismicity in the CSEP experiment.
*Ann Geophys*2010b, 53: 155–164.Google Scholar - Luen B, Stark PB:
*Testing earthquake predictions. IMS Lecture Notes Monograph Series. Probability and Statistics: Essays in Honor of David A. Freedman*. Institute for Mathematical Statistics Press, Beachwood; 2008. 302–315 302-315Google Scholar - Meyer P: Demonstration simplifiée d’un thèoréme de Knight. In
*Sèminaire de, Probabilitès V*. Univ. Strasbourg, Lecture Notes in Math; 1971. vol 191, pp. 191–195 vol 191, pp. 191–195Google Scholar - Nanjo KZ, Tsuruoka H, Hirata N, Jordan TH: Overview of the first earthquake forecast testing experiment in Japan.
*Earth Planets Space*2011, 63(3):159–169. 10.5047/eps.2010.10.003View ArticleGoogle Scholar - Ogata Y: Space-time point-process models for earthquake occurrences.
*Ann Inst Statist Math*1998, 50(2):379–402.View ArticleGoogle Scholar - Papangelou F:
*Summary of some results on point and line processes, in Lewis P.A.W. Stochastic Point Processes*. Wiley, New York; 1972a. pp. 522–532 pp. 522–532Google Scholar - Papangelou F: Integrability of expected increments of point processes and a related random change of scale.
*Trans Amer Math Soc*1972b, 165: 483–506.View ArticleGoogle Scholar - Schorlemmer D, Gerstenberger MC: RELM Testing Center.
*Seismological Res, Lett*2007, 78(1):30–36. 10.1785/gssrl.78.1.30View ArticleGoogle Scholar - Schorlemmer D, Gerstenberger MC, Wiemer S, Jackson DD, Rhoades DA: Earthquake likelihood model testing.
*Seism Res Lett*2007, 78(1):17–29. 10.1785/gssrl.78.1.17View ArticleGoogle Scholar - Schorlemmer D, Zecher JD, Werner MJ, Field EH, Jackson DD, Jordan TH: First results of the Regional Earthquake likelihood models experiment.
*Pure Appl Geophys*2010a, 167: 859–876. 10.1007/s00024-010-0081-5View ArticleGoogle Scholar - Schorlemmer D, Christophersen A, Rovida A, Mele F, Stucchi M, Marzocchi W: Setting up an earthquake forecast experiment in Italy.
*Ann Geophys*2010b, 53: 1–9.Google Scholar - Stark PB: Earthquake prediction: the null hypothesis.
*Geophys J Int*1997, 131: 495–499. 10.1111/j.1365-246X.1997.tb06593.xView ArticleGoogle Scholar - Tsuruoka H, Hirata N, Schorlemmer D, Euchner F, Nanjo KZ, Jordan TH: CSEP Testing Center and the first results of the earthquake forecast testing experiment in Japan.
*Earth Planets Space*2012, 64(8):661–671. 10.5047/eps.2012.06.007View ArticleGoogle Scholar - Werner MJ, Sornette D: Magnitude uncertainties impact seismic rate estimates, forecasts, and predictability experiments.
*J Geophys Res*2008, 113: B08302. doi:10.1029/2007JB005427 doi:10.1029/2007JB005427View ArticleGoogle Scholar - Werner MJ, Zechar JD, Marzocchi W, Wiemer S: Retrospective evaluation of the five-year and ten-year CSEP-Italy earthquake forecasts.
*Ann Geophys*2010, 53(3):11–30. doi:10.4401/ag-4840 doi:10.4401/ag-4840Google Scholar - Zechar JD, Jordan TH: Testing alarm-based earthquake predictions.
*Geophys J Int*2008, 172: 715–724. doi:10.1111/j.1365–246X.2007.03676.x doi:10.1111/j.1365-246X.2007.03676.x 10.1111/j.1365-246X.2007.03676.xView ArticleGoogle Scholar - Zechar JD, Gerstenberger MC, Rhoades DA: Likelihood-based tests for evaluating space-rate-magnitude earthquakes forecasts.
*Bull Seism, Soc Am*2010, 100(3):1184–1195. doi:10.1785/0120090192 doi:10.1785/0120090192 10.1785/0120090192View ArticleGoogle Scholar - Zechar JD, Schorlemmer D, Werner MJ, Gerstenberger MC, Rhoades DA, Jordan TH: Regional earthquake likelihood models I: first-order results.
*Bull Seism, Soc Am*2013, 103(2A):787–798. doi.10.1785/0120120186 doi.10.1785/0120120186 10.1785/0120120186View ArticleGoogle Scholar

## Copyright

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.