Statistical models for temporal variations of seismicity parameters to forecast seismicity rates in Japan
© The Society of Geomagnetism and Earth, Planetary and Space Sciences (SGEPSS); The Seismological Society of Japan; The Volcanological Society of Japan; The Geodetic Society of Japan; The Japanese Society for Planetary Sciences; TERRAPUB. 2011
Received: 30 March 2010
Accepted: 5 October 2010
Published: 4 March 2011
This paper introduces a model to forecast the rate of earthquakes for a specified period and area. The model explicitly predicts the number of earthquakes and b-value of the Gutenberg-Richter distribution for the period of interest with an autoregressive process. The model also incorporates a time dependency adjustment for higher magnitude ranges, assuming that as time passes since the last large earthquake within the area, the probability of another larger earthquake increases. These predictions are overlaid on a spatial density map obtained with a multivariate normal mixture model of the historical earthquakes that have occurred in the area. This forecast model differs from currently proposed models by its density estimation and its assumption of temporal changes. The model has been submitted to the Earthquake Forecasting Testing Experiment for Japan.
Key wordsForecast autoregressive models normal mixture models
The Collaboratory for the Study of Earthquake Predictability (CSEP) (http://www.cseptesting.org/home) is an initiative to test earthquake forecast models in a fair environment. In collaboration with CSEP, the Earthquake Forecasting Testing Experiment for Japan is focused on evaluating models that forecast the seismicity of Japan (Research group “Earthquake Forecast System based on Seismicity of Japan”, 2009). Submissions to the Japanese experiment, identical to the CSEP, require a forecast of the number of earthquakes to occur in specified 0.1° × 0.1° spatial bins over a particular region and time. The forecast rate for a single spatial bin must be divided into rates for specific magnitude bins over a predetermined magnitude range. The model described herein has been submitted to the Japanese initiative.
To create our forecast, we initially take a subregion of the entire forecast area, comprised of many of the aforementioned spatial bins. We consider subregions so that we can pick up local variations in seismicity rate, and force the algorithm to find pockets of seismicity that otherwise might have been ignored if the entire area was studied at once. We consider various facets of the past seismicity in this subregion. We look at the previous overall seismicity rate; the temporal evolution of the proportion of large versus small earthquakes via the b value of the Gutenberg-Richter distribution (Gutenberg and Richter, 1944); and the location of the previous earthquakes. Then, this information is converted into a forecast via an autoregressive process and a multivariate normal mixture model (Everitt, 1993; Chatfield, 2004). The autoregressive process is used to extrapolate the rate of past seismicity to the forecast period. The mixture model is used to convert the locations of past earthquakes into a spatial density map of the subregion. To generate our forecast for a single spatial bin, we multiply the normalised density of the spatial bin obtained with the mixture model by the predicted number of earthquakes obtained with the autoregressive procedure. In this manner, spatial bins within the subregion with high rates of past seismicity will be assigned larger predictions than bins where past seismicity rates are low. We repeat this process for each subregion of the forecast area until we have a forecast for the entire region.
The forecast model differs from currently proposed models by its assumption of non-stationarity and its density estimation technique. We also incorporate an optional time dependency component in our model by assuming that as time passes since the last large earthquake within a subregion, the probability of another large earthquake increases. This requires the knowledge of the repeat times of earthquakes within an area. We cannot estimate the repeat times empirically owing to the long history required, so here we use a simulation approach based on the available historical data. The forecast model can be run with or without this adjustment. We describe the base algorithm (MARFS) and its optional adjustment (MARFSTA) in the Methods section.
Divide the entire forecast region into smaller subregions. For ease, we divide the entire forecast region into rectangular subregions. We endeavour to minimize the total number of subregions for computational reasons, whilst ensuring that each subregion is small enough so that the multivariate normal mixture model can be applied. We also ensure that each subregion includes enough historical earthquakes to reliably calculate the Gutenberg-Richter parameters.
- 2)Consider one smaller subregion at a time.
Calculate a spatial density map of the previous earthquakes in the area using a multivariate normal mixture model.
Predict the parameters of the Gutenberg-Richter distribution for the next period.
Multiply the predicted rate of earthquakes with the density of each spatial bin.
If desired, obtain the time dependent adjusted rates of larger earthquakes for each spatial and magnitude bin.
2.1 Spatial density map
Vertices of Tamba area.
The mixture model ensures that the density for all points in the space is non-zero. It therefore differs from the average approach sometimes used, which simply obtains a density of any bin in the area by dividing the number of earthquakes in that bin over time by the total number of recorded earthquakes. The mixture model also allows for a smooth transition amongst neighbouring bins. We normalize the density map, so that the sum of the density of each spatial bin of interest is one.
2.2 Gutenberg-Richter parameters
After estimating a, we use Eq. (5) directly to obtain the one-ahead forecast by substituting T with T + 1. The autoregressive model is thereby used to predict the next year’s and values. We believe this approach should enable us to model changes in seismicity rates or magnitude distributions.
2.3 Unadjusted earthquakes rates
2.4 Adjusted earthquake rates
3. Assessing Model Predictions
An important and necessary step in the introduction of any new model is to illustrate its ability. As this model will be compared to other forecast techniques within the Earthquake Forecasting Testing Experiment for Japan initiative, we do not perform benchmark comparisons here. However, we show that the model is indeed valid, by first presenting the predictions for the Tamba area, and then presenting plots of the entire Japanese forecast region. We show the location of observed earthquakes during the forecast period.
Counts of forecast and observed earthquakes over all 0.1° × 0.1° bins.
4.1 Tamba area of Japan
We illustrate the method on data obtained from the small Tamba area defined by the vertices in Table 1. We use this area for illustrative purposes only: it does not constitute one of the 25 rectangular subregions we use to create our entire Japan forecast. If we were to use a subregion as small as the Tamba area, the algorithm would be prohibitively long to run. We have high quality data for this area from January 1976 to December 2007 inclusive (Hiroshi Katao, 2008, personal communication). The dataset is compiled by the Disaster Prevention Research Institute (DPRI), Kyoto University. The hypocenters have been determined by DPRI using combined data from the Japan Meteorological Agency, High Sensitivity Seismograph Network Japan (Hinet) (http://www.hinet.bosai.go.jp/), and DPRI stations. We have considered the Tamba area in other work, where it was shown that the Gutenberg-Richter parameters of the Tamba area are temporally variant (Smyth and Mori, 2009). Therefore, we use this area to illustrate the forecast method for the year January through December 1995 and stress that the predictions were obtained using only information available up to the 31st of December, 1994. They are retrospective predictions only in the sense that we are not waiting for validation of our results: we already have the data with which to test the model. We removed all data less than M 2.5. We trialled various cut off values with the all Japan forecasts and found the best cut off values (with retrospective testing) for the different forecast classes. The smallest successful value for mainland Japan was 2.5. As the Tamba area is mainland Japan, we chose to use 2.5 as our cut off magnitude.
When we incorporate a time dependent adjustment for greater than M = 5, the forecast rate of M = 5 earthquakes slightly changes, however the overall spatial pattern does not. The adjusted rate is less than half that of the unadjusted rate. At this point there had been four M = 5 earthquakes in the dataset. Two earthquakes had occurred in 1985, ten years from the start of the catalogue, one earthquake occurred in 1987 and one earthquake occurred in 1992. The repeat times are therefore 10, 0, 2 and 5 years. The mean repeat time is being simulated as larger than three years. Hence, the chance of a greater than M = 5 earthquake was reduced by the adjustment factor as the last M = 5 earthquake had occurred only three years previously in 1992. This highlights the potential pitfalls of this approach. We only have a handful of events greater than M = 5 within this area. It is difficult to calculate reliable repeat times and increasing and decreasing trends until we have enough events in the data. When we forecast rates for all Japan, our data history is longer and our subregions are larger, and thereby we can increase our history of M = 5 events.
The observed events in 1995 directly influence predictions for the following years. For the immediate years, the density moves south west along the diagonal and the adjustment factor for a M = 5 earthquake is less than 1, implying that as we have recently had an earthquake greater than M = 5, the probability of another should be scaled down. Slowly, activity along the off-diagonal forces the density into a more circular shape and the adjustment factor for the probability of an M = 5 earthquake edges above 1.
In this section we looked at the small Tamba area of Japan to illustrate the model. However, in order to submit our model we must forecast rates for all Japan. To apply this model to all Japan, we repeatedly take subregions, treat each subregion individually, run the algorithm, and append the forecasts together. We illustrate an entire Japan forecast in the following section.
4.2 Entire Japanese forecast area
In 2008, the Iwate-Miyagi Nairiku earthquake struck Iwate prefecture, northern Honshu, MJMA = 7.2 (epicentre 39.0283N,140.88E). The aftershock sequence of this earthquake induces a high forecast rate within the immediate area, visible in Fig. 4. Figures 2 and 4 show us that if there is a very large main shock, its aftershock sequence will completely dominate the spatial density of the area. Although this may be a realistic scenario, where earthquakes are predicted in the same area for many years to come, it may be necessary to move the density estimation away from this position. This could be achieved by taking only the last 10 years of data to obtain the density, or by using some form of random jitter. Furthermore, a large main shock and its associated aftershock sequence will induce an elevated forecast rate for the future years in the immediate vicinity. If we were trying to forecast only independent events, our model would severely over-predict in this situation. However, the forecast experiment requires a forecast of the total number of earthquakes during a time period, and does not distinguish between main shocks and aftershocks. Therefore, we do not try and alter the resulting spatial density or forecast rate following a large main shock.
5. Discussion and Conclusions
In conclusion, this forecast model will consider earthquakes more likely in areas where they have already occurred via the mixture model. There will be a gradual slope in density across neighbouring bins. The mixture model should produce areas of high density coincident with previous seismicity. The autoregressive model will pick up any changes in rate and magnitude distributions. As the data increase over time, it may be appropriate to include trend or seasonality analysis, or even some more complicated time series modelling. At this point in time, we use the simplest model possible, owing to lack of data. Overlaying the mixture model with the autoregressive model gives a spatially and temporally variant forecast of seismicity.
We also introduced a time dependency adjustment factor for large magnitude ranges. Time dependency models have been advocated in the literature, for example see Petersen et al. (2007). It is only natural to assume that large earthquakes become more likely as time passes. Here we scaled the probabilities based on the time since the last M = 5 earthquake. This is not a realistic cut off for areas where M = 5 earthquakes are a yearly occurrence. For regions that are prone to larger earthquakes we suggest using a higher magnitude cut off for the scaling probability. It is also possible that the scaling probability be adjusted manually. If the researcher had reason to believe (based on more physical models) that there was to be a M = 8 earthquake imminently, the adjustment factor could be increased to reflect this expert knowledge. The researcher could also, within a particularly seismically active region, have different adjustment factors for each of M 5, M 6, M 7 and M 8 bins.
The overall product of our research is the earthquake forecast algorithms, MARFS and MARFSTA, ready for real time testing. Our model differs from currently proposed models by its density estimation technique and by the inclusion of potential temporal changes, often ignored, within the Gutenberg Richter distribution. We also incorporate a further time dependency component in MARFSTA, by assuming that as time passes since the last large earthquake, the probability of another large earthquake increases. The models described here are submitted to the Earthquake Forecasting Testing Experiment for Japan and are undergoing testing against other well known models in a prospective environment to ascertain which submitted model best forecasts the seismicity of Japan. We look forward with interest to the results of the testing experiment over the coming years, and the subsequent increased understanding of the physics and statistics of earthquake occurrence.
Christine Smyth is the recipient of a Japan Society for the Promotion of Science Postdoctoral Fellowship. We gratefully acknowledge the Japanese Meteorological Agency and Hiroshi Katao for providing the necessary data used in this publication, and the Earthquake Research Institute at Tokyo University for hosting the test center. We also thank the National Institute of Advanced Industrial Science and Technology for making the fault database of Japan publicly available.
- Akaike, H., New look at statistical-model identification, IEEE Trans. Automatic Control, 19, 716–723, 1974.View ArticleGoogle Scholar
- Chatfield, C., The Analysis of Time Series, Chapman and Hall, Florida, 2004.Google Scholar
- Dempster, A. P., N. M. Laird, and D. B. Rubin, Maximum likelihood from incomplete data via EM algorithm, J. Roy. Stat. Soc. B. Stat. Meth., 1, 1–38, 1977.Google Scholar
- Everitt, B. S., Cluster Analysis, Edward Arnold, London, 1993.Google Scholar
- Fawcett, T., An introduction to ROC analysis, Pattern Recognit. Lett., 27, 861–874, 2006.View ArticleGoogle Scholar
- Guo, Z. and Y. Ogata, Statistical relations between the parameters of aftershocks in time, space and magnitude, J. Geophys. Res., 102, 2857–2873, 1997.View ArticleGoogle Scholar
- Gutenberg, B. and C. F. Richter, Frequency of earthquakes in California, Bull. Seismol. Soc. Am., 34, 185–188, 1944.Google Scholar
- McLachlan, G. and S. Ng, The EM algorithm, in The Top-Ten Algorithms in Data Mining, edited by X. Wu and V. Kumar, 93–115 pp, Chapman and Hall/CRC, Boca Raton, Florida, 2009.View ArticleGoogle Scholar
- Murru, M., R. Console, and G. Falcone, Real time earthquake forecasting in Italy, Tectonophysics, 470, 214–223, 2009.View ArticleGoogle Scholar
- Petersen, M. D., T. Q. Cao, K. W. Campbell, and A. D. Frankel, Time-independent and time-dependent seismic hazard assessment for the State of California: Uniform California earthquake rupture forecast model 1.0, Seismol. Res. Lett., 78, 99–109, 2007.View ArticleGoogle Scholar
- Research group “Earthquake Forecast System based on Seismicity of Japan” (K. Z. Nanjo, N. Hirata, H. Tsuruoka are responsible for the wording of the article), Earthquake forecast testing experiment for Japan, Newslett. Seismol. Soc. Jpn., 20, 7–10, 2009 (in Japanese).Google Scholar
- Schorlemmer, D., S. Wiemer, and M. Wyss, Earthquake statistics at Park-field: 1. Stationarity of b values, J. Geophys. Res., 109, 2004.Google Scholar
- Smyth, C. and J. Mori, Assessing temporal variations in the Gutenberg-Richter distribution for a short-term forecast model, Japan Geoscience Union Meeting, Chiba, Japan, 2009.Google Scholar
- Stein, S. and M. Wysession, An Introduction to Seismology, Earthquakes, and Earth Structure, Blackwell, Malden, 2003.Google Scholar
- Wiemer, S. and M. Wyss, Mapping spatial variability of the frequency-magnitude distribution of earthquakes, Adv. Geophys., 45, 259–301, 2002.View ArticleGoogle Scholar