# Application of a weighted likelihood method to hypocenter determination

- M. Imoto
^{1}Email author

**65**:6501201569

https://doi.org/10.5047/eps.2013.07.010

© The Society of Geomagnetism and Earth, Planetary and Space Sciences (SGEPSS); The Seismological Society of Japan; The Volcanological Society of Japan; The Geodetic Society of Japan; The Japanese Society for Planetary Sciences; TERRAPUB. 2013

**Received: **25 June 2012

**Accepted: **28 July 2013

**Published: **6 December 2013

## Abstract

The method of least squares is a standard approach to hypocenter determination in seismology. However, this method is not useful for data contaminated by systematic errors. To address this problem, we propose a weighted likelihood method (WLL) rather than a weighted least-squares method (WLSQ). Assuming a normally distributed random error and systematic errors, both methods give the same solution; however, variances of random errors estimated by WLSQ are much smaller than those estimated by WLL. Examining reasonable random errors, we simulate a case of systematic errors varying linearly with a given parameter, where the number of unknown parameters is reduced to one for simplification. We assume that a systematic error, two different arrays of stations, and three different weights are functions of distance. In the cases where biases affected by systematic errors are adequately reduced, the variances of random errors estimated by WLL become roughly equal to that assumed, but those estimated by WLSQ are much smaller than that assumed. This result implies that WLL is a better approach than WLSQ for data contaminated by systematic errors.

## Key words

## 1. Introduction

The method of least squares is a standard approach to the approximate solution of overdetermined systems. Use of this method can reduce the effect of random errors. The best solution in the least-squares method minimizes the sum of squared residuals—a residual being the difference between an observed value (arrival time of a seismic wave) and the fitted value provided by a model (origin time and location of an earthquake and a seismic velocity model). Hypocenter determination employs the nonlinear least squares method and is implemented by iterative refinement. If it is assumed that random error variances largely vary among observation stations, the weighted least-squares method (WLSQ) can be used to determine the hypocenter more reliably.

The residual is caused by both errors in the measurement of the arrival time and errors in the seismic velocity model for calculating theoretical arrival times. The former is a random error, but the latter is a systematic one. For a nationwide network in Japan, such as those operated by the Japan Meteorological Agency (JMA) and the National Research Institute for Earth Science and Disaster Prevention (NIED), a simple velocity model for hypocenter determination often involves systematic errors in the calculated travel time, since seismic velocities vary from region to region. JMA and NIED currently adopt WLSQ, in which the weight of each observation depends on its hypocentral distance. Their procedure, which may be a better approach than a simple least-squares method, obviously violates the condition that the least-squares method must be applied to data without systematic errors. Therefore, this procedure may give unreliable solutions, which has not yet been addressed.

In this study, we presume both normally distributed random errors and systematic errors and propose the use of the weighted likelihood method (WLL) (Hu and Zidek, 2002; Wang and Zidek, 2005) to address this problem. First, we demonstrate that WLSQ and WLL provide the same solution; however, the variance of random errors estimated by WLL exceeds that estimated by WLSQ. In order to clarify which method estimates more reasonable errors, both methods are applied to data contaminated with systematic errors in order to simulate hypocenter determination with an unsuitable velocity model. The solutions with both methods could be analytically obtained for the simulated data. This paper compares the variance of random errors estimated by the maximum likelihood method with the assumed ones. This comparison indicates that WLSQ underestimates the variance of the random errors and is unreliable as an approach to this problem.

## 2. Method

### 2.1 Weighted least squares

*σ*and

*σ*

_{ i }denote the standard deviation of the random error in a representative observation and the

*i*th observation (

*i*= 1,…

*n*), respectively. The weight

*w*

_{ i }is inversely proportional to the variance of the error, and is represented by . The likelihood function in this case is given as:where

*o*

_{ i }denotes the observed value and

*c*

_{ i }denotes the theoretical value at the

*i*-th observation.

### 2.2 Weighted likelihood method

*σ*

^{2}, , is given byThenwhere the subscript WLL refers to the weighted likelihood method.

When the same weights are applied in WLL and in WLSQ, the same solutions of *c*_{
i
} are obtained. However, the maximum likelihood estimates of *σ*^{2} are different.

## 3. Simulation

In applying WLL to hypocenter determination, the only difference from WLSQ is related to the formula for *σ*^{2}, where Eq. (10) can be compared with Eq. (5). When NIED and JMA determine the hypocenter, a weight with epicentral distance may be introduced to reduce bias due to systematic errors in the calculated travel times. The structure of seismic wave velocities in Japan cannot be modeled with a simple layered structure, as employed by NIED and JMA. Therefore, using arrival times of more distant stations in the determination results in systematic errors, and bias of the determined hypocenter. In this section, we examine which of WLSQ or WLL is better for hypocenter determination affected by systematic errors from an inappropriate velocity structure.

We schematically simulate this problem assuming simple conditions. We assume that an earthquake occurs on the surface of a uniform medium and that stations are densely deployed in a line from the epicenter in the first case, and on a two-dimensional surface in the second case. A systematic error due to an inappropriate velocity structure is as large as a random error at a short distance, and becomes much larger at more distant points. It is assumed that the epicenter of the earthquake is rather well constrained but the origin time of the earthquake is not, since the dense stations are well distributed. Thus, the problem of hypocenter determination is reduced to the problem of determining the origin time of the event.

*i*th (

*i*= 1, 2, ..

*n*) station iswhere

*r*

_{ i }is the epicentral distance,

*T*(

*r*

_{ i }) is the travel time estimated by the model, and

*B*(

*r*

_{ i }) is a systematic error due to an inappropriate velocity model. In the single-layer model, this correction must be a function of epicentral distance, seismic wave velocity in the model, and the actual seismic wave velocity:where

*V*

_{ c }is the actual value of the seismic wave velocity, and

*V*

_{ m }is the seismic wave velocity used in the model. The term on the right-hand side is a linear function of epicentral distance:Equation (11) can be represented as:As the epicenter is well constrained under our assumption, the epicentral distance is considered to be a known parameter. Therefore, subtracting

*T*(

*r*

_{ i }) from the arrival time, we consider

*t*

_{0}to be an unknown parameter, and Eq. (14) can be replaced by:Here, the problem of hypocenter determination is reduced to the problem of estimating

*t*

_{0}from observation

*o*

_{ i }contaminated by a systematic error and a random error.

*R*

^{2}is calculated as:where

*x*refers to the estimated origin time. The maximum likelihood estimate of the origin time, , minimizes

*R*

^{2}and is calculated as follows:The minimum of

*R*

^{2}is calculated as:

In the present study, we can estimate the expected values of Eqs. (17) and (18) with a probability density function for the epicentral distance of stations, *S*(*r*), and the assumption that the random error of *ε*_{
i
} is normally distributed with a mean 0 and variance *δ*^{2}. A weight function *W*(*r*) is applied and different limits of epicentral distance, *u*, are considered.

In order to derive an analytical solution of Eq. (17), we assume a two-step random value generation.

Step 1. Random generation of *n* stations within a distance of *u*.

Step 2. For each set of stations, generate a large number (*l*) of series of random errors, (*ε*_{
ij
}: where *j* refers to the number of trials, *j* = 1, 2,…*l*).

*j*th trial of

*ε*

_{ ij }, we obtain the maximum likelihood estimate of the origin time as follows:The average value of over

*j*, , is estimated by:where

*ε*

_{ ij }refers to the random error at the

*i*th station in the

*j*th trial. For a large number of

*l*, approaches as:where

*E*denotes the expected value of a variable, since, for large enough values of

*l*,which is an average of independent random errors

*ε*

_{ ij }normally distributed with a mean 0. Thus, for each selection of stations, the expected value of Eq. (17) is given by Eq. (21).

*n*, Eq. (21) can be represented as:

*m*stations are distributed within an epicentral distance of 10 units.

*j*th trial of a set of stations:For a set of large

*n*, is replaced with of Eq. (21). This replacement will be justified by a Monte Carlo simulation. We obtain the average value of Eq. (28) over

*j*for a large number of

*l*:Equation (29) can be represented by:whereWe analytically estimated the expected values of and for a large number of

*n*in this way. Hereafter, and denote the expected values of and , respectively.

*S*(

*r*) is given as:In the other case, we consider stations deployed uniformly in a plane (Array2). The density function of

*S*(

*r*) is given asThree different weighting functions are considered below. For the first case of applying a constant weight, the function of

*W*(

*r*) is given as:For the second case, where the weight decays inversely as the second power of distance, the weight is given as:**

The assumed systematic error, *B*(*r*), increases linearly with epicentral distance. WLSQ likely misapplies a weighting function that decays as the inverse of the second order of the epicentral distance, since the square of the systematic error increases in proportion to the second power of distance.

*A, C*, and

*D*for each of the six cases. Substituting these values into Eqs. (23), (30), and (5) or (10), we obtain standard deviations of random errors, for WLSQ and for WLL.

*u*varying from 1 to 10 units, where

*α*= 1 and

*δ*= 1. The systematic error is assumed to be equal to, or less than, the random error in a range of 0 to 1 units, where the weight is fixed to 1. (dotted line) and (dashed line) are compared with the assumed random error (

*δ*= 1). In each set, represents the bias affected by the systematic error, since the correct origin time is set to 0.

In Figs. 1(a), 1(b), 1(d) and 1(e), biases caused by systematic errors are not adequately reduced, where departs from 0. In these cases, exceeds the standard deviation of the assumed random error. In both station arrays, it is clear that the most quickly decaying weight function (Figs. 1(c) and 1(f)), gives the best solutions, where becomes at most 1 even in cases including stations up to 10 units away. This suggests that biases are adequately reduced. In these cases, becomes reasonable at around 1, mostly equal to the standard deviation of the assumed random error. In contrast, values are much smaller than that assumed. Therefore, WLSQ underestimates the standard deviation of random errors, suggesting an invalid application of WLSQ to the present issue.

Table 2 summarizes
and
for *u* = 10. Results obtained from the weight function decaying as the inverse of the seventh power are added as a more quickly decaying weight function, which is the case of NIED. In this case,
and
become slightly better than those of other cases, where
becomes smaller and
approaches 1. These examples suggest that if
exceeds the standard deviation of the assumed random error, biases caused by systematic errors would not be adequately reduced.

*u*and are independent of

*m*, but the standard error of divided by the square root of

*C*, is inversely proportional to the square root of

*m*, which can easily be verified from Eqs. (5), (10), (23), (24), (25), (26), (30) and (31). The ratio of to its standard error becomes large in proportion to the square root of

*m*. Consequently, bias cannot be compared with its standard error.

## 4. Discussion and Summary

*S*(

*r*), and their locations are randomly generated. Random errors are also generated once for each station, which implies

*l*= 1 in Eqs. (20) and (29). Ten thousand sets of stations are generated for each distance limit, which is shifted from 1 to 10 units for every unit step. In each set of the figure, the simple averages over ten thousand sets for

*m*= 200 (dashed line) and

*m*= 500 (dotted line) are compared with analytical solutions (solid line).

In general, as the number of stations increases (fixed distance limit), the results obtained by the simulation approach those of the analytical solution. Qualitative relations among and obtained from the Monte Carlo simulation do not largely differ from those of the analytical solutions, except in the case of a few tens of stations. Although not shown here, the agreement between the simulation and the analytical solution in other cases (the other station array and/or different weights) becomes better than that of Fig. 2. We have only discussed qualitative relations between the parameters, so an analytical solution is an appropriate approach to this issue.

*Y*:where

*Y*

_{min}is the hypocentral distance of the station closest to the hypocenter (km) (if

*Y*

_{min}≤ 50, then

*Y*

_{min}= 50; if

*W*(

*Y*) > 1, then

*W*(

*Y*) = 1).

However, JMA introduced a similar weighting depending on the hypocentral distance except for the order of power:

For *S*-waves (*W*_{
s
}): *W*_{
s
} = *W*_{
p
}/3.

Comparing weights in these methods with those of our simulation indicates that NIED, using a weight function decaying faster than the inverse fourth power, probably succeeds in reducing the bias by systematic errors (Table 2). But, JMA may fail to reduce bias with a weight function decaying inversely with the second power of distance. Both NIED and JMA must underestimate standard errors of hypocenters, which are calculated based on . Standard errors of hypocenters are often critical in considering reliable hypocenter distributions of clustering earthquakes and their tectonic implications.

For deeper earthquakes, a qualitative consideration could be given as follows. At short epicentral distances compared with the focal depth, systematic errors reach a level depending on the focal depth. On the other hand, at longer distances, systematic errors increase with distance mostly similar to those of shallow earthquakes. These result in smaller variances of residuals than those of shallower earthquakes, since a range of systematic errors becomes smaller than those of shallower earthquakes. This implies that a significant bias may remain, even if approaches the variance of random error. Such aspects should be discussed further as separate studies, since station arrays, the number of unknown parameters, and other factors must be rearranged to address these issues.

The present paper focuses on only the preliminary formulation of applying WLL to hypocenter determinations. In practical applications of WLL, an optimal weight function should be determined taking into consideration the results in the case of various kinds of earthquakes in different areas and depths (Hu and Zidek, 2002). WLL can be applied to hypocenter determination with a minor revision of current methods which are based on misapplications of WLSQ.

We propose the use of WLL rather than WLSQ for hypocenter determination. Both methods give the same solution; however, the variance estimated by WLSQ is much smaller than that estimated by WLL. Our simulation indicates that a weight function that decays faster with distance gives a better solution, which could be realized by WLL. In contrast, such flexible weight functions are not justified by WLSQ, since weights should be inversely proportional to the variances of errors. In the cases where biases affected by systematic errors are adequately reduced, the variances of random errors estimated by WLL become roughly equal to the given one, but those estimated by WLSQ are much smaller than the given one. Therefore, WLSQ should not be used to address systematic errors in hypocenter determination. We conclude that WLL is a better approach than WLSQ for data contaminated by systematic errors.

## Declarations

### Acknowledgments

The author thanks two anonymous reviewers for their critical reading and comments on this manuscript.

## Authors’ Affiliations

## References

- Hu, F. and J. V. Zidek, The weighted likelihood,
*Can. J. Stat.*,**30**, 347–371, 2002.View ArticleGoogle Scholar - Japan Meteorological Agency, Users’ Guide, http://data.sokki.jmbsc.or.jp/cdrom/seismological/catalog/notese.htm (as of April 1, 2013).
- Sakamoto, Y., M. Ishiguro, and G. Kitagawa,
*Akaike Information Criterion Statistics*, 290 pp, D. Reidel, Dordrecht, 1983.Google Scholar - Ueno, H., S. Hatakeyama, J. Funakaki, and N. Hamada, Improvement of hypocenter determination procedures in the Japan Meteorological Agency,
*Quart. J. Seismol.*,**65**, 123–131, 2002 (in Japanese).Google Scholar - Wang, X. and J. V. Zidek, Derivation of mixture distributions and weighted likelihood function as minimizers of KL-divergence subject to constraints,
*Ann. Inst. Statist. Math.*,**57**, 687–701, 2005.View ArticleGoogle Scholar