Skip to main content

Distribution of maximum earthquake magnitudes in future time intervals: application to the seismicity of Japan (1923–2007)


We have modified the new method for the statistical estimation of the tail distribution of earthquake seismic moments introduced by Pisarenko et al. (2009) and applied it to the earthquake catalog of Japan (1923–2007). The newly modified method is based on the two main limit theorems of the theory of extreme values and on the derived duality between the generalized Pareto distribution (GPD) and the generalized extreme value distribution (GEV). Using this method, we obtain the distribution of maximum earthquake magnitudes in future time intervals of arbitrary duration τ. This distribution can be characterized by its quantile Qq (τ) at any desirable statistical level q. The quantile Qq(τ) provides a much more stable and robust characteristic than the traditional absolute maximum magnitude Mmax (Mmax can be obtained as the limit of Qq(τ) as q → 1, τ → ∞). The best estimates of the parameters governing the distribution of Qq(τ) for Japan (1923–2007) are the following: ξGEV = −0.19 ± 0.07; μGEV(200) = 6.339 ± 0.038; σGEV (200) = 0.600 ± 0.022; Q0.90,GEV(10) = 8.34 ± 0.32. We have also estimated Qq(τ) for a set of q-values and future time periods in the range 1 ≤ τ ≤ 50 years from 2007 onwards. For comparison, the absolute maximum estimate Mmax-GEV = 9.57 ± 0.86 has a scatter more than twice that of the 90% quantile Q0.90,gev(10) of the maximum magnitude over the next 10 years beginning from 2007.

1. Introduction

The work presented in this article has two goals: (1) to adapt the method suggested by Pisarenko et al. (2009) for the statistical estimation of the tail of the distribution of earthquake magnitudes to catalogs in which earthquake magnitudes are reported in discrete values, and (2) to apply the newly developed method to the Japan Meteorological Agency (JMA) magnitude catalog of Japan (1923–2007) in order to estimate the maximum possible magnitude and other measures characterizing the tail of the distribution of magnitudes.

The method of Pisarenko et al. (2009) is a continuation and improvement of the technique suggested in Pisarenko et al. (2008). Both rely on the assumption that the distribution of earthquake magnitudes is limited to some maximum value Mmax, which is itself probably significantly less than the absolute limit imposed by the finiteness of the Earth. This maximum value Mmax may reflect the largest possible set of seismo-tectonic structures in a given tectonic region that can support an earthquake, combined with an extremal occurrence of dynamical energy release per unit area. The simplest model embodying the idea of a maximum magnitude is the truncated Gutenberg-Richter (GR) magnitude distribution truncated at Mmax:


where F(m) is the cumulative probability distribution of earthquake magnitudes, b is the slope parameter, m0 is the lower known threshold above which magnitudes can be considered to be reliably recorded, Mmax is the maximum possible magnitude, and C is the normalizing constant (which depends on the unknown parameters b and Mmax) (Cosentino et al., 1977; Kijko and Sellevol, 1989, 1992; Pisarenko et al., 1996; Kijko, 2004). The parameter Mmax is a priori a very convenient tool for building engineers and the insurance business. However, multiple attempts to use Mmax have clearly shown that this parameter is unstable with respect to minor variations in the catalogs and, in particular, for use with incomplete regional catalogs, which are a rather common situation in seismology. Consequently, the parameter Mmax is an unreliable measure of the largest seismogical risks. The truncated GR model can be contrasted with the various modifications of the GR law stretching to infinity. These modifications impose a finite-size constraint only on the statistical average of the energy released by earthquakes (see, for example, Sornette et al., 1996; Kagan, 1999; Kagan and Schoenberg, 2001), but they contradict the flniteness of seismogenic structures in the Earth and therefore have not been universally accepted.

The chief innovation, introduced first by Pisarenko et al. (2009) and extended here, is to combine the two main limit theorems of extreme value theory (EVT), which allows us to derive the distribution of T-maxima (maximum magnitude occurring in sequential time intervals of duration T) for arbitrary T. This distribution enables derivation of any desired statistical characteristic of the future T-maximum. The two limit theorems of EVT correspond to the generalized extreme value distribution (GEV) and to the generalized Pareto distribution (GPD), respectively. Pisarenko et al. (2009) established the direct relations between the parameters of these two distributions. The duality between the GEV and GPD provides a new approach to check the consistency of the estimation of the tail characteristics of the distribution of earthquake magnitudes for earthquakes occurring over arbitrary time intervals.

Instead of focusing on the unstable parameter Mmax, we suggest a new, stable, and convenient characteristic, Mmax(τ), defined as the maximum earthquake that can be recorded over a future time interval of duration τ. The random value Mmax(τ) can be described by its distribution function or by its quantiles Q q (τ), which are, in contrast to Mmax, stable and robust characteristics. In addition, if τ→∞, then Mmax(τ) → Mmax with a probability of one. The methods for calculating Q q (τ) are given in the following section. In particular, we can estimate Q q (τ) for, say, q = 10%, 5%, and 1%, as well as for the median (q = 50%) for any desirable time interval τ. These methods are illustrated below on the magnitude catalog of the JMA, over the time period 1923–2007, for magnitudes m ≥ 4.1.

We should stress that our method relies on the assumption that the distribution of earthquake magnitudes exhibits a regular limit behavior on its right (for large magnitudes)— even though there is no way to be absolutely certain that this is the case due to the limited data set for large and extreme earthquake sizes. Thus, in specific cases, seismologists are forced to accept the most appropriate assumption about the behavior of the magnitude distribution on its right end. The assumption used in our paper (which coincides with the assumption of the EVT: the existence of a non-trivial asymptotic distribution for centered and normalized maximum of sample) seems to be the least harmful and the most fruitful. It provides the three well-known types of possible limit distributions for the maximum (in our paper we use only one of these). Without such an assumption, it would scarcely be possible to obtain any useful result on the distribution of sample maxima.

2. The Method

The method developed here is based on the following assumptions:

(1) the Poisson property of independence in time of the main shocks;

(2) independence between the observed magnitudes M;

(3) regularity of the tail probability of the earthquake magnitudes M;

We now present the elements that justify using these assumptions and then describe the specifics of the method.

2.1 Test of the Poisson hypothesis

Our analysis is performed for main shocks, following the application of a declustering method. We used the Kagan-Knopoff time-space window declustering method to remove the aftershocks. This method has a number of shortcomings, and other versions of aftershock cleansing are available, but these have no universally accepted advantages. There is a widespread opinion among seismologists that the overwhelming majority of main shocks can be considered to be independent random variables. This property is more evident when earthquake observations are considered on a global scale, but it is still a reasonable hypothesis for large seismic regions, such as Japan. The Japanese data that we use exhibit evident irregularities in the registration process, which are visible in Fig. 7. In particular, during the time interval 1945–1965, the lack of observations is clearly evident. Fortunately, this effect is not essential for the larger earthquakes, which are the focus of our work.

We note that the model of a Poisson flow of events corresponds to a renewal model with exponentially distributed intervals between successive events. Testing for the Poisson property is reduced to the study of the distribution of time intervals between successive main shocks. In our analysis, we are going to study this distribution for events in Japan with magnitudes larger than some chosen lower threshold. We will show that, at least for large earthquakes with m ≥ 7.0, the exponential distribution cannot be rejected at a rather high statistical significance level. For earthquakes with m ≥ 6. 0, the exponential distribution can be accepted, at least since 1966. For earthquakes of smaller sizes, the deviations of the distribution of the time intervals from the exponential law becomes more pronounced; consequently, the renewal model with non-exponentially distributed time intervals is perhaps more appropriate. However, this is a rather irrelevant finding for our purpose of determining the distribution of maximum earthquake magnitudes, which is controlled mainly by the large earthquakes.

We analyze the empirical distributions of time intervals between successive events in sub-catalogs derived from the main catalog by selecting main shocks posterior to some time T0 and with magnitudes larger than the lower threshold m0. In order to test the exponential hypothesis, we use the Kolmogorov distance KD-test modified by Stephens (1974) for the case where the unknown parameter is estimated from the same sample. We obtain the following results for different choices of m0 and T0. The variable 〈t k t k −1〉 is the mean inter-event waiting time.

The exponential Poisson hypothesis is thus acceptable (accepting, say, if p-value > 0.1) for m ≥ 6.0 since 1966, and for m ≥ 7.0 for the whole catalog starting from 1923.

2.2 Independence of the magnitudes

Figure 7 shows the magnitudes of the main shocks as a function of time from 1923 to 2007, inside the polygonal domain shown in Fig. 1, whose depths are <70 km. Together with the test shown in the previous subsection, one can see that the magnitudes are approximately random above m = 6, which is the regime of interest for the application of the EVT. We also note that the GR distribution is rather well verified, as depicted in Figs. 2 and 4, confirming the standard one-point statistics of earthquakes.

Fig. 1
figure 1

Map of the region kept for our study; the coordinates of nodes of the polygon delimiting the area of study are [(160.00; 45.00); (150.00; 50.00); (140.00; 50.00); (130.00; 45.00); (120.00; 35.00) : (120.00; 30.00); (130.00; 25.00); (150.00; 25.00); (160.00; 45.00)].

Fig. 2
figure 2

Magnitude-frequency distribution of the 32,324 earthquakes that occurred in the region delimited by the polygon shown in Fig. 1 over the period 1923–2007.

2.3 Regularity of the tail probability of the earthquake magnitudes M

According to the EVT, the limit distribution of maxima can be obtained in two ways. The first, sometimes called the “peak over threshold” method, consists of increasing a threshold h above which observations are kept. The distribution of event sizes that exceed h tends—under an affine transformation—to the GPD as h tends to infinity. The GPD depends on two unknown parameters (ξ, s) and on the known threshold h (see, for example, Embrechts et al. (1997)). For the case of random values that are limited from above, the GPD can be written as follows:


Here, ξ is the form parameter, s is the scale parameter, and the combination hs/ξ represents the uppermost magnitude, that we shall denote Mmax:


We shall consider only this case of a finite Mmax, to capture the finiteness of seismo-tectonic structures in the Earth, as discussed in Introduction.

The second way to obtain the limit distribution of maxima consists of selecting directly the maxima occurring in sequences of n successive observations M n = max(m1, …, m n ) and in studying their distribution as n goes to infinity. In accordance with the main theorem of the EVT (see, for example, Embrechts et al., 1997), this distribution, named the GEV, can be written (for the case of random values limited from above) in the form:


The conditions guaranteeing the validity of these two limit theorems include the regularity of the original distributions of magnitudes in their tail. These conditions ensure the existence of a non-degenerate limit distribution of M n after a proper centering and normalization. Following the standard approach, we assume that the conditions for which a non-degenerate limit distribution of M n exists are truly valid. If this were not to be the case, we would not be able to perform any meaningful analysis. While this argument may appear circular, it is standard approach in statistics in general and in statistical seismology in particular. One can never really prove the validity of mathematical conditions solely from data. The model or theory can, however, be progressively validated by comparing its predictions with the results of precise tests (Sornette et al., 2007, 2008). It is therefore the conclusions that we derive from our analysis that will support—or refute—the value of the analysis itself.

2.4 Formulation of the theory and procedure

In our analysis, we study the maximum magnitudes occurring in time interval (0, T). We assume that the flow of main shocks is a Poissonian stationary process with some intensity λ. This property for main shocks was studied and confirmed in appendix A of Pisarenko et al. (2008) for the Harvard catalog of seismic moments over the time period 1 January 1977–20 December 2004. The term “main shock” refers here to the events that remain following the application of a suitable desclustering algorithm (see Pisarenko et al., 2008, 2009, and below). In Subsection 2.1, we tested the Poisson hypothesis and confirmed that (1) for earthquakes with m ≥ 6.0, the exponential distribution can be accepted—at least since 1966; (2) for large earthquakes with m ≥ 7.0, the exponential distribution cannot be rejected with rather a high statistical significance level. We can then proceed with the description of the model.

Given the intensity λ and the duration T of the time window, the average number of observations (main shocks) within the interval (0, T) is equal to 〈n〉 = λT. For T → ∞, the number of observations in (0, T) tends to infinity with a probability of one; we can therefore use Eq. (4) as the limit distribution of the maximum magnitudes m T of the main shocks occurring in time interval (0, T) of growing sizes (Pisarenko et al., 2008).

Pisarenko et al. (2009) showed that, for a Poissonian flow of main shocks, the two limit distributions, namely, the GPD given by relation (2) and the GEV given by relation (4), are related in a simple manner. Here, we briefly summarize the main points and refer the reader to Pisarenko et al. (2009) for details. If the random variable (rv) X has the GPD-distribution (relation (2)) and the maximum of a random sequence of observations X k is taken:


where ν is a random Poissonian value with parameter λT, with λT 1, then M T has the GEV-distribution (Eq. (4)) with the following parameters:


These expressions are valid up to small terms of order exp(−λT), which are neglected.

The inverse is true as well: if M T = max(X1, …, X ν ) has the GEV distribution (Eq. (4)) with parameters ζ, σ, μ, then the original distribution of X k has the GPD distribution (Eq. (2)) with parameters:


The proof can be found in Pisarenko et al. (2009) where we see that the form parameter in the GPD and the GEV is always identical, whereas the centering and normalizing parameters differ.

Using Eqs. (6)(11), one can recalculate the estimates ζ(T), σ(T), μ(T) obtained for some T into corresponding estimates for another time interval of different duration τ:


Equations (6)(13) are very convenient, and we shall use them in our estimation procedures. In the following, we use the notation T to denote the duration of a window in the known catalog (or part of the catalog) used for the estimation of the parameters, whereas we use τ to refer to a future time interval (prediction).

From the GPD (relation (2)) or the GEV (Eq. (4)), we can obtain the quantiles Q q (τ), which are proposed as stable robust characteristics of the tail distribution of magnitudes. These quantiles are the roots of equations:


Inverting Eqs. (14) and (15) for x as a function of q and using Eqs. (6)(8), we obtain:


3. Application of the GPD and GEV to the Estimation of r-maximum Magnitudes in Japan

3.1 Characteristics of the JMA data

The full JMA catalog covers the spatial domain delimited by 25.02 ≤ latitude ≤ 49.53° and 121.01 ≤ longitude ≤ 156.36° and by the temporal window 1 January 1923 to 30 April 2007. The depths of the earthquakes fall in the interval 0 ≤ depth ≤ 657 km. The magnitudes are expressed in 0.1-bins and vary in the interval 4.1 ≤ magnitude ≤ 8.2. There are 39,316 events in this space-time domain. The spatial domain covered by the JMA catalog covers the Kuril Islands and the east border of Asia.

Here, we focus our study on earthquakes occurring within the central Japanese islands. We thus restrict the territory of our study to earthquakes occurring within the polygon with coordinates [(160.00; 45.00); (150.00; 50.00); (140.00; 50.00); (130.00; 45.00); (120.00; 35.00): (120.00; 30.00); (130.00; 25.00); (150.00; 25.00); (160.00; 45.00)]. Figure 1 shows the map of the region delineated by the polygon within which we perform our study. There were 32,324 events within this area. The corresponding magnitude-frequency is shown in Fig. 2, and the histogram of magnitudes is shown in Fig. 3.

Fig. 3
figure 3

Histogram of the magnitudes of earthquakes used in Fig. 2. The discrete 0.1-bins are clearly visible.

Next, we only keep “shallow” earthquakes whose depths are <70 km and apply the declustering Knopoff-Kagan space-time window algorithm (Knopoff and Kagan, 1977). The remaining events constitute our “main shocks”, on which we are going to apply the GPD and the GEV methods described above. There are 6,497 main shocks in the polygon shown in Fig. 1 with depths <70 km. The magnitude-frequency curve of these main shocks is shown in Fig. 4. It should be noted that the b-slope of the magnitude-frequency of main shocks is significantly smaller (by approx. 0.15) than the corresponding b-slope of the magnitude-frequency for all events. From the relatively small number of remaining main shocks, one concludes that the percentage of aftershocks in Japan is very high (about 80% according to the Knopoff-Kagan algorithm). The histogram of these main events with magnitudes m ≥ 5.5 is shown in Fig. 5. This histogram of magnitudes is characterized by non-random irregularities and a non-monotonic behavior. The irregularities force us to aggregate 0.1-bins into 0.2-bins, and the resulting discreteness in the magnitudes requires a special treatment (in particular, the use of the chi-square test), which is explained in the next subsection. On a positive note, no visible pattern associated with half-integer magnitude values can be detected. Thus, we accept that the use of 0.2-bins will be sufficient to remove the irregularities.

Fig. 4
figure 4

Magnitude-frequency distribution of the 6,497 “main shocks” remaining in the domain delineated by the polygon shown in Fig. 1 over the period 1923–2007 that have depths of <70 km, after applying the Knopoff-Kagan declustering algorithm.

Figure 6 plots the yearly number of earthquakes averaged over 10 years for three magnitude thresholds: m ≥ 4.1 (all available events); m ≥ 5.5; m ≥ 6.0. The latter time-series with m ≥ 6.0 appears to be approximately stationary, with an intensity of about three to four events per year. Figure 7 shows the flow of main events (same variable as in Fig. 6 but for the main shocks obtained after applying the declustering Knopoff-Kagan algorithm). For large events (m ≥ 6.0), the flow is approximately stationary.

3.2 Adaptation for binned magnitudes

As shown in Figs. 3 and 5, the earthquake magnitudes of the JMA catalog are discrete. Moreover, the oscillations decorating the decay with magnitudes shown in Fig. 5 require further coarse-graining with bins of 0.2 units of magnitudes, as explained in the previous subsection. However, all of the considerations identified in Section 2 refer to continuous random values, with continuous distribution functions. For discrete random variables, the EVT is not directly applicable. This contradiction is avoided as follows.

Fig. 5
figure 5

Histogram of the magnitudes of the main shocks whose magnitude-frequency distribution is shown in Fig. 4. The discrete 0.1-bins are clearly visible. There are additional oscillations decorating the decay with magnitudes that require further coarse-graining, as explained in the text.

Fig. 6
figure 6

Yearly number of earthquakes averaged over 10 years for three magnitude thresholds: m ≥ 4.1 (all available events); m ≥ 5.5; m ≥ 6.0.

Fig. 7
figure 7

Flow of main shocks from 1923 to 2007. Main shocks are defined as “shallow” earthquakes inside the polygonal domain shown in Fig. 1, whose depths are <70 km and which remain after applying the declustering Knopoff-Kagan space-time window algorithm (Knopoff and Kagan, 1977).

Consider a catalog in which the magnitudes are reported with a magnitude step Δm. In most existing catalogs, including that of Japan, in most cases Δm = 0.1. In some catalogs, two decimal digits are reported, but the last digit is fictitious unless the magnitudes are recalculated from seismic moments, themselves determined with several exact digits (such as for the mW magnitude in the Harvard catalog). Here, we assume that the digitization is fulfilled exactly without random errors in intervals ((k − 1) · Δm; k · Δm), where k is an integer. As a consequence, in the GPD approach, we should use only half-integer thresholds h = (k − 1/2) · Δm, which is not a serious restriction.

Furthermore, having a sample of observations exceeding some h = (k − 1/2) · Δm, and fitting the GPD to it, we need to test the goodness of fit of the GEV model to the empirical distribution. For continuous random variables, the Kolmogorov test or the Anderson-Darling test has been successfully used in earlier studies (Pisarenko et al., 2008, 2009). For discrete variables, such statistical tools tailored for continuous random variables are incorrect. To demonstrate this, we calculated the Kolmogorov distances for N = 1,000 discrete artificial samples, each of them obeying the GEV. Our aim was to check the impact of discrete magnitudes on the Kolmogorov test. Specifically, we generate N times n synthetic random magnitudes m i , i = 1, …, n, distributed according to the GEV distribution (relation (4)). Then, for each of the N set, we discretize the magnitudes by rounding off the random numbers with Δm = 0.1, thus mimicking the empirical data. For each of the N sets, we constructed the Kolmogorov statistic as follows. We estimated the empirical distribution function F n for the n iid observations as , where I (m i x) is the indicator function, equal to 1 if m i x and equal to 0 otherwise. The Kolmogorov statistics for the cumulative distribution function is then given by

where Sup n is the supremum of the set of distances. Having N realizations of K j , we found that their distribution is very far from the true one (the Kolmogorov distances for discrete magnitudes are much larger than those for continuous random variables.).

This result shows that in our analysis we are forced to use statistical tools adapted to discrete random variables. We have chosen the standard Pearson chi-square (χ2) method as it provides a way to both estimate unknown parameters and strictly evaluate the goodness of fit. The χ2-statistic is calculated by finding the difference between each observed and theoretical frequency for each possible magnitude bin, then squaring each difference, dividing it by the theoretical frequency, and taking the sum of the results. The χ2-statistics is then distributed according to the χ2-distribution with n − 1−3 degrees of freedom (df) since we estimate three parameters in fitting the theoretical GEV distribution.

The chi-square test has two specific requirements:

  1. 1.

    In order to be able to apply the chi-square test, a sufficient number of observations is needed in each bin (we chose this minimum number as being equal to 8 (see discussion of this matter in Borovkov (1987));

  2. 2.

    In order to compare two different fits (corresponding to two different vectors of parameters), it is highly desirable to have the same binning in both experiments in order to avoid large variations in the significance levels, which depend on the binning.

In general, the chi-square test is less sensitive and less efficient than the Kolmogorov test or the Anderson-Darling test due to the fact that the chi-square test coarsens data by placing data into discrete bins.

When using the GEV, the digitized GEV of the magnitude maxima in successive T-intervals is fitted using the χ2-method.

3.3 The GPD approach

Consider the discrete set of magnitudes registered with step Δm over threshold h,


The corresponding discrete probabilities read


The last (r + 1)-th bin covers the interval (h + r· 0.05;∞). We use the following expression


Let us assume that the interval (Eq. (18)) contains n k observations. Summing over the r + 1 intervals, the total number of observations is . The chi-square sum S(ξ, s) is then written as:


S(ξ, s) should be minimized over the parameters (ξ, s). This minimum value is distributed according to the χ2-distribution with (r − 2) df. The quality of the fit of the empirical distribution by expressions (19) and (20) is quantified by the probability , where χ2(r − 2) is the chi-square random value with (r − 2) df, i.e. Pexc is the probability of exceeding the minimum fitted chi-square sum. The larger the Pexc, the better the goodness of fit.

For magnitude thresholds h ≤ 5.95 and h ≥ 6.65, the chi-square sums min(S) happened to be very large, leading to very small Pexc values and indicating that such thresholds are not acceptable. For thresholds in the interval (6.05 ≤ h ≤ 6.55), the results of the chi-square fitting procedure are shown in Table 1. In order to obtain confidence intervals, we also performed N b = 100 bootstrapping procedures on our initial sample and averaged the results over the obtained estimates, as described in Pisarenko (2008, 2009).

Table 1 Chi-square fitting procedure using the GPD approach.

As pointed out above, if the distribution of magnitudes over thresholds obeys the , then, for a Poissonian flow of events, the T-maxima have the GEV distribution:


Thus, we can use an alternative approach, the GEV, to fit the sample of T-maxima derived from the same underlying catalog.

Having estimated the first triple (ξ, σ T , μ T ) or the second triple (ξ, s, h), we use these estimates to predict the quantile of τ-maxima for any arbitrary future time interval (0, τ), since these τ-maxima have the distribution , as seen from Eqs. (6)(13). Recall that, in Eqs. (6)(13), λ denotes the intensity of the Poissonian flow of events whose magnitudes exceed the threshold h.

In Table 1, three thresholds h = 6.15: h = 6.25, and h = 6.35 give very close estimates. In contrast, the estimates obtained for the thresholds h = 6.05 and h = 6.45 have smaller goodness of fit (smaller Pexc), suggesting that the estimates corresponding to the highest goodness of fit (h = 6.25) should be accepted:


These estimates are very close to their mean values obtained over the three thresholds h = 6.15; 6.25; 6.35.

In order to estimate the statistical scatter of these estimates, we simulated our whole procedure of estimation N b = 100 times on artificial GPD samples with known parameters. For a better stability, instead of sample standard deviations, we used the corresponding order statistics, namely, the difference of quantiles:


For Gaussian distributions, this quantity (Eq. (25)) coincides with its standard deviation (SD). For distributions with heavy tails, the difference (Eq. (25)) is a more robust estimate of the scatter than the usual SD. Combining the scatter estimates (Eq. (25)) derived from simulations to the mean values (Eq. (24)), the final results of the GPD approach for the JMA catalog can be summarized by


One can observe that the statistical scatter of Mmax exceeds the scatter of the quantile Q0.90(10) by a factor of more than two, thereby confirming once more our earlier conclusion on the instability of Mmax.

3.4 The GEV approach

In this approach, we divide the total time interval Tc from 1923 to 2007 covered by the catalog into a sequence of non-overlapping and touching intervals of length T. The maximum magnitude M T,j in each T-interval is identified. We have k = [Tc/T] T-intervals, so the sample of our T-maxima has size k: MT1, …, M T,k We assume that T is large enough, so that each M T,j can be considered as being sampled from the GEV distribution with some unknown parameters (ξ, σ T , μ T ) that should be estimated through the sample MT,1, …, M T,k .

The larger T is, the more accurate is the approximation for this observed sample, but one cannot choose too large a T because the sample size k of the set of T-maxima would be too small, resulting in an inefficient statistical estimation of the three unknown parameters (ξ, σ T , μ T ). Besides, we should keep in mind the restrictions mentioned above, imposed by the chi-square method, that the number of bins should be constant for all used T values and that the minimum number of observations per bin should not be < 8. In order to satisfy these contradictory constraints, as a compromise, we had to restrict the T-values to be sampled in the rather small interval


It should be noted that, for all T-values >50 days, the estimates of the parameters do not vary much and that only for T ≤ 40 do the estimates change drastically. We have chosen T = 200 and obtained the following estimates:


The estimates of the scatter in Eq. (28) were obtained by the simulation method with 100 realizations, similar to the method used in the GPD approach. In estimating the parameters, we have used the shuffling procedure described in Pisarenko et al. (2009), which is similar to the bootstrap method, with NS = 100 realizations. It should be noted that, in Eq. (28), the T-value for the parameters μ, σ is indicated in days (T = 200 days) whereas in the quantile Q, the τ-value is indicated in years (τ = 10 years).

Comparing ξ, Mmax and the Q-estimates obtained by the GPD and the GEV approaches, the GEV method is found to be somewhat more efficient (its scatter is smaller by a factor approximately equal to 0.7). This can be explained by the fact that the GEV approach uses the full catalog more intensively: all events with magnitude m ≥ 4.1 participate (in principle) in the estimation, whereas the GPD approach throws out all events with m < h.

Finally, we show in Figs. 8 and 9 the dependence of the quantile Q q (τ) as a function of τ, for τ = 1–50 years, as estimated by our two approaches, respectively given by expressions (16) and (17). One can observe that the quantile Q q (τ) obtained by the two methods are very close, which testifies to the stability of the estimations. Figure 10 plots the median (quantile Q q (τ) for q = 50%) of the distribution of the maximum magnitude as a function of the future τ years, together with the two accompanying quantiles 16% and 84%, which correspond to the usual ±1 SD. These quantiles Q q (τ) can be very useful tools for pricing risks in the insurance business and for optimizing the allocation of resources and preparedness by state governments.

Fig. 8
figure 8

Quantile Q q (τ) of the distribution of maxima over a future time interval τ for three confidence levels q, defined by expression (16). The three curves use the parameters of the GPD estimated from the JMA catalog, as explained in the text.

Fig. 9
figure 9

Quantile Q q (τ) of the distribution of maxima over a future time interval τ for three confidence levels q, defined by expression (17). The three curves use the parameters of the GEV estimated from the JMA catalog, as explained in the text.

Fig. 10
figure 10

Median (quantile Q q (τ) for q = 50%) of the distribution of the maximum magnitude over a future time interval τ, obtained by the GEV method, as a function of τ (years), together with the two accompanying quantiles 16% and 84% that correspond to the usual ±1 SD.

4. Discussion and Conclusions

We have adapted the new method of statistical estimation suggested by Pisarenko et al. (2009) to earthquake catalogs with discrete magnitudes. This method is based on the duality of the two main limit theorems of EVT. One theorem leads to the GPD (peak over threshold approach), and the other theorem leads to the GEV (T-maximum method). Both limit distributions must possess the same form parameter ξ. For the Japanese catalog of earthquake magnitudes over the period 1923–2007, both approaches provide almost the same statistical estimate for the form parameter, which is found to be negative; . A negative form parameter corresponds to a distribution of magnitudes that is bounded from above (by a parameter named Mmax). This maximum magnitude corresponds to the finiteness of the geological structures supporting earthquakes. The density distribution extends to its final value Mmax with a very small probability weight in its neighborhood, characterized by a tangency of a high degree (“duck beak” shape). In fact, the limit behavior of the density distribution of Japanese earthquake magnitudes is described by the function , i.e. by a polynomial of degree approximately equal to 4. This is the explanation of the unstable character of the statistical estimates of the parameter Mmax: a small change in the catalog of earthquake magnitude can give rise to a significant fluctuation in the resulting estimate of Mmax. In contrast, the estimation of the integral parameter Q q (τ) is generally more stable and robust, as we demonstrate quantitatively for the Japanese catalog of earthquake magnitudes over the period 1923–2007.

The main problem in the statistical study of the tail of the distribution of earthquake magnitudes (as well as in distributions of other rarely observable extremes) is the estimation of quantiles that exceed the data range, i.e. quantiles of level q > 1 − 1/n, where n is the sample size. We would like to stress once more that the reliable estimation of quantiles of levels q > 1 − 1/n can be made only with some additional assumptions on the behavior of the tail. Sometimes, such assumptions can be made on the basis of physical processes underlying the phenomena under study. For this purpose, we used general mathematical limit theorems, namely, the theorems of EVT. In our case, the assumptions for the validity of EVT amount to assuming a regular (power-like) behavior of the tail 1 − F (m) of the distribution of earthquake magnitudes in the vicinity of its rightmost point Mmax. Partial justification for such an assumption is the fact that, without it, there is no meaningful limit theorem in EVT. Of course, there is no a priori guarantee that these assumptions will hold in all real situations, and they should be discussed and possibly verified or supported by other means. In fact, because EVT suggests a statistical methodology for the extrapolation of quantiles beyond the data range, the question of whether such interpolation is justified or not in a given problem should be investigated carefully in each concrete situation.


  • Borovkov, A. A., Statistique Mathematique, Moscow, Mir., 1987.

    Google Scholar 

  • Cosentino, P., V. Ficara, and D. Luzio, Truncated exponential frequency-magnitude relationship in the earthquake statistics, Bull. Seismol. Soc. Am., 67, 1615–1623, 1977.

    Google Scholar 

  • Embrechts, P., C. Kluppelberg, and T. Mikosch, Modelling Extrememal Events, Springer, 1997.

  • Epstein, B. C. and C. Lomnitz, A model for the occurrence of large earthquakes, Nature, 211, 954–956, 1966.

    Article  Google Scholar 

  • Kagan, Y. Y., Universality of the seismic moment-frequency relation, Pure Appl. Geophys, 155, 537–573, 1999.

    Article  Google Scholar 

  • Kagan, Y. Y. and F. Schoenberg, Estimation of the upper cutoff parameter for the tapered distribution, J. Appl. Probab, 38A, 901–918, 2001.

    Article  Google Scholar 

  • Kijko, A., Estimation of the maximum earthquake magnitude, Mmax, Pure Appl. Geophys, 161, 1–27, 2004.

    Article  Google Scholar 

  • Kijko, A. and M. A. Sellevoll, Estimation of earthquake hazard parameters from incomplete data files. Part I, Utilization of extreme and complete catalogues with different threshold magnitudes, Bull Seismol Soc Am., 79, 645–654, 1989.

    Google Scholar 

  • Kijko, A. and M. A. Sellevoll, Estimation of earthquake hazard parameters from incomplete data files. Part II, Incorporation of magnitude heterogeneity, Bull. Seismol. Soc. Am, 82, 120–134, 1992.

    Google Scholar 

  • Knopoff, L. and Y. Kagan, Analysis of the extremes as applied to earthquake problems, J Geophys Res., 82, 5647–5657, 1977.

    Article  Google Scholar 

  • Pisarenko, V. F., A. A. Lyubushin, V. B. Lysenko, and T. V. Golubeva, Statistical estimation of seismic hazard parameters: maximum possible magnitude and related parameters, Bull Seismol Soc Am., 86, 691700, 1996.

    Google Scholar 

  • Pisarenko, V. F., A. Sornette, D. Sornette, and M. V. Rodkin, New approach to the characterization of Mmax and of the tail of the distribution of earthquake magnitudes, Pure Appl Geophys., 165, 847–888, 2008.

    Article  Google Scholar 

  • Pisarenko, V. F., A. Sornette, D. Sornette, and M. V. Rodkin, Characterization of the tail of the distribution of earthquake magnitudes by combining the GEV and GPD descriptions of extreme value theory, Pure Appl Geophys, (, 2009.

  • Sornette, D., L. Knopoff, Y. Y. Kagan, and C. Vanneste, Rank-ordering statistics of extreme events: application to the distribution of large earthquakes, J. Geophys. Res, 101, 13883–13893, 1996.

    Article  Google Scholar 

  • Sornette, D., A. B. Davis, K. Ide, K. R. Vixie, V. Pisarenko, and J. R. Kamm, Algorithm for model validation: Theory and applications, Proc. Natl. Acad. Sci. USA, 104(16), 6562–6567, 2007.

    Article  Google Scholar 

  • Sornette, D., A. B. Davis, J. R. Kamm, and K. Ide, A general strategy for physics-based model validation illustrated with earthquake phenomenology, atmospheric radiative transfer, and computational fluid dynamics, in Book series: Lecture Notes in Computational Science and Engineering, vol 62, Book Series: Computational Methods in Transport: Verification and Validation, edited by F. Graziani and D. Swesty, pp. 19–73, Springer, New York (NY), (, 2008.

    Chapter  Google Scholar 

  • Stephens, M. A., EDF Statistics for Goodness of Fit and Some Comparisons, J. Am. Statist. Soc, 69(347), 730–737, 1974.

    Article  Google Scholar 

Download references


This work was partially supported (V. F. Pisarenko, M. V. Rodkin) by the Russian Foundation for Basic research, grant 09-05-01039a, and by the Swiss ETH CCES project EXTREMES (DS).

Author information

Authors and Affiliations


Corresponding author

Correspondence to D. Sornette.

Rights and permissions

Open Access  This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit

Reprints and permissions

About this article

Cite this article

Pisarenko, V.F., Sornette, D. & Rodkin, M.V. Distribution of maximum earthquake magnitudes in future time intervals: application to the seismicity of Japan (1923–2007). Earth Planet Sp 62, 567–578 (2010).

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

Key words