Skip to main content

Structured regularization based velocity structure estimation in local earthquake tomography for the adaptation to velocity discontinuities


We propose a local earthquake tomography method that applies a structured regularization technique to determine sharp changes in Earth’s seismic velocity structure using arrival time data of direct waves. Our approach focuses on the ability to better image two common features that are observed in Earth’s seismic velocity structure: sharp changes in velocities that correspond to material boundaries, such as the Conrad and Moho discontinuities; and gradual changes in velocity that are associated with pressure and temperature distributions in the crust and mantle. We employ different penalty terms in the vertical and horizontal directions to refine the earthquake tomography. We utilize a vertical-direction (depth) penalty that takes the form of the \({l}_{1}\)-sum of the \({l}_{2}\)-norms of the second-order differences of the horizontal units in the vertical direction. This penalty is intended to represent sharp velocity changes caused by discontinuities by creating a piecewise linear depth profile of seismic velocity. We set a horizontal-direction penalty term on the basis of the \({l}_{2}\)-norm to express gradual velocity tendencies in the horizontal direction, which has been often used in conventional tomography methods. We use a synthetic data set to demonstrate that our method provides significant improvements over velocity structures estimated using conventional methods by obtaining stable estimates of both steep and gradual changes in velocity. We also demonstrate that our proposed method is robust to variations in the amplitude of the velocity jump, the initial velocity model, and the number of observed arrival times, compared with conventional approaches, and verify the adaptability of the proposed method to dipping discontinuities. Furthermore, we apply our proposed method to real seismic data in central Japan and present the potential of our method for detecting velocity discontinuities using the observed arrival times from a small number of local earthquakes.

Graphical Abstract


Earthquake tomography methods are used to estimate seismic velocity structure in Earth’s crust. The crust is an approximately 10–50-km-thick layer that covers Earth's surface (Bassin 2000), and hosts intense shallow seismicity. Local earthquake tomography (LET) has often been used to capture the high-resolution, three-dimensional (3-D) crustal structure of a given region (e.g., Aki and Lee 1976; Thurber 1993) and to relocate earthquake hypocenters (e.g., Thurber 1983). Therefore, LET and associated approaches provide fundamental information for understanding the mechanisms of earthquake generation in and around the crust (e.g., Alesssandrini et al. 2001; Zhang and Thurber 2003; Nugraha and Mori 2006).

These tomography methods work adequately when a large number of different ray paths are produced by well-distributed source and receiver arrays. However, seismic sources are often localized around fault zones, and most seismic observation stations are deployed near Earth's surface, resulting in an inhomogeneous distribution of seismic ray paths owing to these uneven source and receiver distributions. Therefore, LET commonly suffers from the unstable estimation of structural parameters, or overfitting. Regularization approaches in the inversion have effects on mitigating such instability and overfitting problems. In LET, Laplacian regularization, one of the regularization methods based on \({l}_{2}\)-norms, has often been used to stabilize crustal seismic velocity structure estimations (e.g., Lees and Crosson 1989; Zhang et al. 1998; Moran et al. 1999). The penalty for dissimilar velocity via \({l}_{2}\)-norms yields smooth fluctuations in seismic velocity. The smoothed estimates are reasonably acceptable in seismology, because spatial pressure and temperature variations, which are key factors affecting the seismic velocity structure, are generally gradual. However, such regularization often ignores an important component of crustal structures; i.e., a velocity discontinuity that is due to a rapid change in seismic velocity and often represents either a geological boundary or a solid–liquid contact. The Conrad and Mohorovičić (Moho) discontinuities are well-known, and have been incorporated into one-dimensional (1-D) velocity models, such as the PREM (Dziewonski and Anderson 1981) and IASP91 (Kennett and Engdahl 1991) models. Furthermore, there may also be local discontinuities in the crust that are difficult to image, such as the boundary between a sedimentary basin and basement rocks. It is, therefore, desirable to obtain stable estimates of both steep and gradual changes in seismic velocity. One way to overcome the pitfall of \({l}_{2}\)-norm-based regularization is to place grid points along a discontinuity (Zhao et al. 1992; Moran et al. 1999), but this approach requires accurate prior knowledge of the discontinuity. Another approach is to change grids in adaptation to observations (e.g., Thurber and Eberhart-Phillips 1999), or to assign spatially fine grids enough for image a discontinuity using a dense seismic observation (e.g., Kato et al. 2010, 2021). In our study, we estimate 3-D velocity structure and automatic detection of unidentified velocity discontinuities while fixing the locations of grid points.

To realize reliable estimations of velocity structure including the discontinuities by the framework of tomographic analysis, we develop a new regularization method for 3-D LET that can handle both sharp and gradual changes in the seismic velocity structure of the crust. Specifically, we utilize a combination of the following two penalty terms in a geophysical inverse problem: (i) an \({l}_{1}\)-sum-type penalty on the \({l}_{2}\)-norm of the second-order differences between horizontal units in the vertical (depth) direction; and (ii) an \({l}_{2}\)-sum-type penalty on the first-order differences of the horizontal directions. By combining the two types of penalty term, our proposed method detects steep velocity gradients along the depth direction, with horizontal variations in seismic velocities. In particular, penalty term (i) plays a role to detect unknown structural changes, such as the Conrad discontinuities without prior knowledge.

Penalty term (i) imposed in the vertical (depth) direction is a version of \({l}_{1}\) trend filtering (Kim et al. 2009), which is a sparse estimation technique. Sparse estimations with \({l}_{1}\)-type penalties yield estimates with zero values and work well in balancing the tradeoffs of mitigating overfitting and obtaining estimation accuracy when the estimand has sparse representation (e.g., Tibshirani 1996; Schmidt et al. 2007). Recently, sparse estimations have been utilized in seismology, such as the inference of fault segments (Klinger 2010), the slip distribution of long-term slow slip events (Nakata et al. 2017), and the spatial distribution of changes in seismic scattering properties from small data sets (Hirose et al. 2020). In our case, the vertical-direction penalty causes the distribution of resulting velocities averaged on the horizontal unit to be piecewise linear in the vertical direction; that is, our proposed method enhances sharp structural changes of seismic velocities at depths, where they occur. The horizontal-direction penalty term (ii) produces velocity distributions with smooth fluctuations in the horizontal direction that fit the common understanding of velocity structures in Earth's interior, as horizontal variations in velocity are generally mild compared with vertical variations. We determine all values of hyperparameters (regularization parameters) via cross-validation.

This paper is organized as follows. We first outline the basis of LET and introduce our proposed approach. We then conduct synthetic tests to demonstrate that the proposed method works better than conventional methods in estimating velocity structures with sharp vertical changes. In addition, we apply the proposed method to real seismic data. Results of the analysis indicate the ability of our LET method with structural regularization to clarify structural discontinuities in the crust with 3-D velocity structure, even when the number of available observational data is not large. Additional details on the mathematical formulations and numerical experiments are described in the Appendix.

Mathematical formulation

In this section, we provide the LET mathematical formulation, focusing on estimations of 3-D velocity structures. We focus only on cases using compressional-wave (P-wave) arrivals, as the description does not depend on a specific seismic phase.

LET fundamental framework

We first design 3-D grid points to model subsurface velocity structures. Let \({v}_{x,y,z}\) be a seismic velocity parameter at grid point \(\left(x,y,z\right)\). In this paper, the \(z\) axis indicates depth, and the \(x{-}y\) plane indicates horizontal location. We assume that the grid points are arranged at uniform intervals in the horizontal and vertical directions, respectively. Hereafter, we refer to a plane consisting of grid points that are located in the same depth (\(z\)) as a “layer”. We then calculate the velocity \({V}_{\tilde{x },\tilde{y },\tilde{z }}\) at an arbitrary point \(\left(\tilde{x },\tilde{y },\tilde{z }\right)\) by linear interpolation using the values of the velocity parameters at the nearest eight grid points. This point \(\left(\tilde{x },\tilde{y },\tilde{z }\right)\) is not necessarily included in the set of grid points \(v=\{{v}_{x,y,z}\}\).

An arrival time contains information on the following factors: the origin time \({\tau }_{i}\) of earthquake \(i\); the hypocenter location \({h}_{i}\) of earthquake \(i\); the velocity parameter \({v}_{x,y,z}\) at grid point \(\left(x,y,z\right)\); and the ray path from \({h}_{i}\) to seismic station \({s}_{j}\). These factors are combined using ray theory to provide the predicted arrival time \({T}_{i,j}^{\left({\text{cal}}\right)}\) from hypocenter \({h}_{i}\) to seismic station \({s}_{j}\):

$${T}_{i,j}^{\left({\text{cal}}\right)} = {\tau }_{i}+{\int }_{{h}_{i}}^{{s}_{j}}\frac{1}{{V}_{\tilde{x },\tilde{y },\tilde{z }}} d\rho ,$$

where \(d\rho\) denotes the element of the path length. Estimations of velocity parameters in LET are usually conducted based on the (damped or regularized) least-squares method. The objective function to be minimized is an additive form of the residual sum of squares (RSS) between the calculated arrival times \({T}_{i,j}^{\left({\text{cal}}\right)}\) and the observed arrival times \({T}_{i,j}^{\left({\text{obs}}\right)}\) for all of the available earthquake–station pairs, and penalty terms:

$$\begin{array}{c}\tilde{f } = \sum \nolimits_{i\in I}\sum \nolimits_{j\in J}{\left({T}_{i,j}^{\left({\text{obs}}\right)}-{T}_{i,j}^{\left({\text{cal}}\right)}\right)}^{2}+D\left(v,h\right)+P\left(v\right) ,\end{array}$$

where \(h\) is the set consisting of the hypocenter locations and origin times, and \(I\) and \(J\) are the observed earthquakes and available observation stations, respectively. The second and third terms (\(D(v, h)\) and \(P(v)\)) represent the penalties on \((v, h)\) and \(v\), respectively. Details of these terms are explained in the next paragraph. After setting the initial values of model parameters, \(D(v, h)\) and \(P(v)\), we obtain estimates of objective factors (velocity and hypocentral parameters and ray paths) in an iterative computation. We obtain estimates of objective factors (velocity and hypocentral parameters) by iteratively updating them. In each iteration, velocity and hypocentral parameters are updated jointly using ray path reevaluating for every earthquake–station pair. This update procedure is conducted until the desired accuracy of the tomography is achieved.

The second term, \(D\left(v,h\right)\), in Eq. (1) is the damping term that consists of square norm of model parameter change from the initial values. In the field of earthquake tomography, such estimation with damping term (damped least-square method, DLS) has been utilized (e.g., Aki and Lee 1976; Thurber 1983). Although such damping term has effects in avoiding unstable estimation and overfitting, it generally does not take the spatial information of grid points into account. The third term, \(P\left(v\right)\), in Eq. (1) represents an additional penalty that depends on only velocity parameters and incorporates the spatial information. In the DLS, \(P\left(v\right)\) is not employed. For \(P\left(v\right)\), regularization based on the \({l}_{2}\)-smoothness has often been used (e.g., Lees and Crosson 1989; Zhang et al. 1998). For example, the following terms can be employed as \(P\left(v\right)\):

$$\begin{aligned}&{\lambda }_{1}{\sum }_{x,y,z}{\sum }_{\left({x}^{{\prime}},{y}^{{\prime}},{z}^{{\prime}}\right)\sim \left(x,y,z\right)}{\left({v}_{x,y,z}-{v}_{{x}^{{\prime}},{y}^{{\prime}},{z}^{{\prime}}}\right)}^{2} , \quad \\ &{\lambda }_{2}{\sum }_{x,y,z}|| \Delta {v}_{x,y,z}|{|}_{2}^{2} ,\end{aligned}$$

where the relation \(\left({x}^{{\prime}},{y}^{{\prime}},{z}^{{\prime}}\right)\sim \left(x,y,z\right)\) means that the two grid points are adjacent to each other, \(\Delta\) indicates the Laplacian operator, and \(||\cdot |{|}_{2}\) represents the \({l}_{2}\)-norm. \({\lambda }_{1}\) and \({\lambda }_{2}\) are non-negative regularization parameters. The former term penalizes dissimilarity among adjacent grid points, and the latter term shrinks the variation in velocity gradients in the three directions. By employing the above penalty terms as \(P(v)\), we can smooth fluctuations in velocity parameters, as well as suppress the destabilization of estimated parameters. Such regularization methods mitigate overfitting to some extent, yet it often discards the presence of discontinuities, since the resulting estimate smooths discontinuities out. Thus, we propose yet another penalty as \(P(v)\), which is described in the next subsection, for obtaining more accurate estimations of 3-D velocity structures involving discontinuities using the LET framework.

Proposed method: structured regularization for 3-D LET

Here we propose a structured regularization approach to accurately image two different velocity changes: sharp velocity changes in the vertical direction at discontinuities; and relatively gradual velocity changes in the horizontal directions. Our objective function, which is minimized to estimate the optimal model parameters, has the form introduced in Eq. (1) employing two additional penalty terms (the vertical-direction regularization term \({\Omega }_{\text{ver}}\) and the horizontal-direction regularization term \({\Omega }_{\text{hor}}\)) as follows:

$$\begin{array}{c}P\left(v\right) = {\lambda }_{\text{ver}}{\Omega }_{\text{ver}}\left(v\right)+\frac{1}{2}{\lambda }_{\text{hor}}{\Omega }_{\text{hor}}\left(v\right) ,\end{array}$$

where \({\lambda }_{\text{ver}}\) and \({\lambda }_{\text{hor}}\) are non-negative regularization parameters. We multiply \({\Omega }_{\text{hor}}\left(v\right)\) by \(1/2\) for the convenience of computation (also see the Appendix). We obtain estimates of velocity parameters by applying iterative calculations based on the alternating direction method of multipliers (ADMM, Glowinski and Marroco 1975; Gabay and Mercier 1976) to this nonlinear and nonconvex problem. The detailed estimation procedure is described in the Appendix.

The vertical penalty \({\Omega }_{\text{ver}}\) takes the form:

$$\begin{array}{c}{\Omega }_{\text{ver}}\left(v\right) = {\sum }_{z}\sqrt{{g}_{z}\left(v\right)} ,\end{array}$$
$$\begin{array}{c}{g}_{z}\left(v\right) = {\sum }_{x,y}{\left\{\left({v}_{x,y,z-1}-{v}_{x,y,z}\right)-\left({v}_{x,y,z}-{v}_{x,y,z+1}\right)\right\}}^{2} .\end{array}$$

The vertical-direction penalty \({\Omega }_{\text{ver}}\) is the sum of the square root of \({g}_{z}\), that is, the \({l}_{2}\)-norm of the second-order differences between the horizontal layers at different depths. This form is a version of \({l}_{1}\) trend filtering (Kim et al. 2009) that has been applied in various research fields (e.g., Tibshirani 2014; Selvin et al. 2016; Guntuboyina et al. 2020). This method is known to be suitable for capturing underlying piecewise linear trends. A notable advantage of \({l}_{1}\) trend filtering is a reduction in the penalized elements to zero, in contrast to \({l}_{2}\)-type regularization, which does not provide this reduction (e.g., Wang et al. 2016). In this study, we utilize this approach to detect and adapt to velocity discontinuities by focusing on suppression of the variation in velocity gradients. Values of \({g}_{z}\), the penalized elements, become to be zero when gradient of the average velocities among the \(z-1\), \(z\), and \(z+1\) th layers of depth is constant, and thus \({\Omega }_{\text{ver}}\) forces the minimizer of Eq. (2) piecewise linearly in the vertical direction. In general, seismic velocity changes sharply around material boundaries, such as the Conrad and Moho discontinuities. Our penalty term enhances the depths at which such sharp changes occur, by detecting the change points of the velocity gradient.

Next, the horizontal penalty \({\Omega }_{\text{hor}}\) is given by

$$\begin{array}{c}{\Omega }_{\text{hor}}\left(v\right) = {\sum }_{x,y,z}{\sum }_{\left({x}^{{\prime}},{y}^{{\prime}},z\right)\sim \left(x,y,z\right)}{\left({v}_{x,y,z}-{v}_{{x}^{{\prime}},{y}^{{\prime}},z}\right)}^{2} .\end{array}$$

The horizontal-direction penalty builds upon the first-order velocity differences between adjacent grid points at the same depth. The term \({\Omega }_{\text{hor}}\) is the \({l}_{2}\)-type penalty, which allows the resultant velocity parameters to vary smoothly. The penalty terms in Eqs. (3) and (5) need to be divided by the corresponding grid intervals if the grid points are not arranged at respective uniform intervals.

Figure 1 illustrates how the penalty terms work in the vertical and horizontal directions. Our proposed vertical-direction penalty \({\Omega }_{\text{ver}}\) is based on the \({l}_{1}\)-sum of \({l}_{2}\)-norm (sum of the square root of \({g}_{z}\)), which suppresses variations in the average–velocity gradient in the vertical direction. Using the proposed approach, we can adapt to sharp velocity changes due to geological discontinuities at depth, and there is no requirement for prior information on the location of the discontinuity. It is noted that regularization parameters \({\lambda }_{\text{ver}}>0\) and \({\lambda }_{\text{hor}}>0\) in Eq. (2) control the smoothness of the resulting velocity structure in the vertical and horizontal directions, respectively. When \({\lambda }_{\text{ver}}\) is large, velocity gradients are strongly suppressed, and the resulting depth–averaged velocity, therefore, tends to have few steep velocity gradients. In contrast, when \({\lambda }_{\text{ver}}\) is close to zero, the resulting depth–averaged velocity becomes unsmooth, since the variations of velocity gradients are hardly taken into account. In the horizontal direction, large \({\lambda }_{\text{hor}}\) tends to make estimated velocity parameters uniform in each layer, and small \({\lambda }_{\text{hor}}\) permits unsmooth variations. If both \({\lambda }_{\text{ver}}\) and \({\lambda }_{\text{hor}}\) are close to zero, the proposed estimation method is almost identical to the DLS method.

Fig. 1
figure 1

Schematic diagrams of the penalty terms used in the proposed method. a Vertical-direction penalty. The sum of the second-order velocity differences among adjacent layers of depth is penalized. b Horizontal-direction penalty. The first-order velocity differences among adjacent grid points on the same layer are penalized. The central grid point in this figure has four adjacent grid points

Numerical experiment

We evaluate the performances of our proposed regularization method and two conventional methods—the DLS method and regularization via \({l}_{2}\)-norm-based smoothing—to determine the effectiveness of our proposed method in reproducing the seismic velocity structure of a given region. The additional penalty term \(P\left(v\right)\) in Eq. (1) is zero when we estimate parameters via DLS. Smoothing methods based on \({l}_{2}\)-norm have often been used in LET, as mentioned in the Introduction. For the \({l}_{2}\)-norm-based smoothing, in this experiment we employed the following \({l}_{2}\)-norm-based penalty term as \(P(v)\) in Eq. (1):

$$\begin{array}{c}{P}^{{l}_{2}}\left(v\right) = {\lambda }_{\text{ver}}^{{l}_{2}}{\Omega }_{\text{ver}}^{{l}_{2}}\left(v\right)+\frac{1}{2}{\lambda }_{\text{hor}}^{{l}_{2}}{\Omega }_{\text{hor}}^{{l}_{2}}\left(v\right) ,\end{array}$$


$$\begin{aligned}{\Omega }_{\text{ver}}^{{l}_{2}}\left(v\right) &= {\sum }_{z} \,{g}_{z}\left(v\right) \\ &= {\sum }_{x,y,z}{\left\{\left({v}_{x,y,z-1}-{v}_{x,y,z}\right)\right.} {\left.-\left({v}_{x,y,z}-{v}_{x,y,z+1}\right)\right\}}^{2} ,\end{aligned}$$
$${\Omega }_{\text{hor}}^{{l}_{2}}\left(v\right) = {\Omega }_{\text{hor}}\left(v\right) ,$$

and \(\left({\lambda }_{\text{ver}}^{{l}_{2}},{\lambda }_{\text{hor}}^{{l}_{2}}\right)\) are non-negative regularization parameters. Both this and our proposed method impose the same penalty in the horizontal direction, because we focus on investigating the advantage of the sparse estimation on the accuracy. We hereafter refer to this method as “\({l}_{2}\)-smoothness regularization” (or “L2”) for notational simplicity. A key difference between \({l}_{2}\)-smoothness regularization and our proposed methods is the employed norm in the penalties for the velocity structure; the former uses the \({l}_{2}\)-sum of the \({l}_{2}\)-norm (sum of \({g}_{z}\); Eq. (7)), and the latter uses the \({l}_{1}\)-sum of the \({l}_{2}\)-norm (sum of square roots of \({g}_{z}\); Eq. (3)). In this experiment, we used the same procedures for the different estimation methods, except for the estimation of velocity parameters for comparing the accuracy of imaging of velocity structures. We applied the algorithm in SIMULPS12 (Thurber 1993), a LET-based software package, to determine the hypocenter locations and perform the 3-D ray-tracing calculations through the resultant velocity model for each method.

We determined the regularization parameters via cross-validation. We first split the data set into training and validation data sets. We then estimated the velocity parameters using given regularization parameter values and the training data set comprising a set of prepared values. Finally, we selected the regularization parameter values within the set that minimized the root mean square error (RMSE) of the predicted arrival times for the validation data set.

Synthetic data

We conducted synthetic tests using the Japan Meteorological Agency (JMA) unified earthquake catalog to investigate the performances of the approaches with different regularizations. The location of the study area is shown in Fig. 2. The data set consists of 199 earthquakes that occurred in central Japan. We obtained 3954 P-wave arrival times from 68 seismic stations in the target region. The arrival times were divided into training and validation data sets, with 2965 arrival times for estimating the velocity parameters and 989 arrival times for validating method accuracy.

Fig. 2
figure 2

(Left) Map of the target region (bold rectangle) in Japan. (Right) Distributions of epicenters and observation stations that were used in our synthetic test. Circles represent earthquake epicenters, and grey inverted triangles represent observation stations. Epicenters are color-coded by depth

We constructed a 26-layer model that extended from 0.0 km (surface) to 25.0 km depth at 1.0-km intervals. We denote the surface layer as “Layer 0” and the layers with grid points at \(d\) km depth as “Layer \(d\)”. We then placed 36 (\(6\times 6\)) horizontal-directed grid points at an 8.0-km horizontal interval in each layer, with the center of the grid points at \({35.25}^{\circ }\) N, \({138.25}^{\circ }\) E. We set outer points, which surrounded the main target region and have fixed velocity, because some of the hypocenters and stations were located outside the target region. We arranged the outer points as those that were 220 km from the end grid points of each layer in the horizontal direction, and set the “outer layer” at 200 km depth in the vertical direction, to suppress the influence of the velocities at the outer grid points.

We calculated the synthetic P-wave arrival times as follows. We first defined the baseline velocity of each layer, as shown in Fig. 3a. We assumed a 1-D velocity model with a sharp increase in velocity at around Layer 12. We then generated “true” velocities at the grid points by adding \(\pm 5\)% anomalies to the baseline velocities to produce a checkerboard pattern for each layer, as shown in Fig. 3b. Finally, we calculated synthetic arrival times for the available earthquake–station pairs using the true 3-D velocity structure, and generated additional time by adding Gaussian noise with a standard deviation of 0.1 s.

Fig. 3
figure 3

a Baseline velocity–depth profile for the 26-layer model: 4.0 km/s in Layers 0–11, 4.5 km/s in Layer 12, and 5.0 km/s in Layers 13–25. b Introduced velocity anomalies of the true velocity structure from the baseline velocity in each layer. Dots represent grid point locations. Blue and red indicate that the estimated velocities are faster and slower than the baseline velocity, respectively


Figure 4 shows the average velocity–depth profile for each method. When averaging, we used the estimated values of velocity parameters, except for the outer grid points. The initial value of velocity parameter in each grid point was assumed to be 4.0 km/s in this synthetic test. The following regularization parameter values were determined via cross-validation: \(\left({\lambda }_{\text{ver}}^{{l}_{2}},{\lambda }_{\text{hor}}^{{l}_{2}}\right)=\left(0.50, 0.06\right)\) in the \({l}_{2}\)-smoothness regularization (Eq. (6)), and \(\left({\lambda }_{{\text{ve}}{\text{r}}},{\lambda }_{\text{hor}}\right)=\left(0.10, 0.10\right)\) in the proposed method (Eq. (2)). The averaged velocities obtained via the estimation methods all capture the sharp increases in velocity around Layer 12, as shown in Fig. 4a. However, the DLS method output shows obvious fluctuations in its estimated velocity structure. This may be due to the 1 km grid interval in the vertical direction being finer than the grid interval that LET studies have generally employed when using data from the nationwide seismic network in Japan (e.g., Matsubara et al. 2017). These unstable DLS estimations indicate that it is difficult to adapt to sharp changes in the velocity structure and avoid ill-posed estimations of the velocity structure without using information on the spatial arrangement of the grid points. \({l}_{2}\)-smoothness regularization outperformed DLS due to the employed regularization, which reduced the fluctuations in averaged velocities. However, the \({l}_{2}\)-smoothness regularization estimates at grid points in the layers near and below the velocity jump (Layers 13–25) were unable to reproduce the true velocity structure. Conversely, our proposed method recovered the true average velocities reasonably well, including the sharp increase in velocity around Layer 12. We quantitatively compared the accuracy of estimation of each method by calculating the mean absolute error (MAE):

Fig. 4
figure 4

a Comparison of the estimated average velocity structures. Estimates from DLS (green line), \({l}_{2}\)-smoothness regularization (L2; blue line), and the proposed method (red line) are shown. Black and grey lines illustrate the true and initial velocity structures, respectively. bd \({g}_{z}\) values (colored circles), which are computed from the estimated average velocity structure for each method (grey lines)

$${\text{MAE}}=\frac{1}{{N}_{g}}{\sum }_{x,y,z}\left|{v}_{x,y,z}^{\left({\text{estimates}}\right)} - {v}_{x,y,z}^{\left({\text{true}}\right)}\right| ,$$

where \({N}_{g}\) is the number of grid points, and \({v}_{x,y,z}^{\left({\text{estimates}}\right)}\) and \({v}_{x,y,z}^{\left({\text{true}}\right)}\) are the estimated and true velocity parameters at each grid point \((x, y, z)\), respectively. The values of MAE of DLS, \({l}_{2}\)-smoothness regularization, and our proposed methods were 0.383, 0.080, and 0.040, respectively.

The norm of the penalty in the vertical direction differs between the \({l}_{2}\)-smoothness regularization and the proposed method, as shown in Eqs. (3) and (7). The \({l}_{2}\)-smoothness regularization employs the \({l}_{2}\)-sum of the \({l}_{2}\)-norm as the vertical-direction penalty, whereas the proposed method employs the \({l}_{1}\)-sum of the \({l}_{2}\)-norm. The \({g}_{z}\) values (Eq. (4)), which were evaluated using the obtained velocity structure for each method, are illustrated in Fig. 4b. The \({g}_{z}\) values should be zero for most of the layers, except for those around Layer 12, where there is a sudden velocity change, because the true velocity structure was generated from only three baseline velocities (Fig. 3a): 4.0 km/s in Layers 0–11, 4.5 km/s in Layer 12, and 5.0 km/s in Layers 13–25. Most of the computed \({g}_{z}\) values for the DLS-estimated velocity structure were far from zero, as shown in Fig. 4b. Although the \({l}_{2}\)-smoothness regularization-estimated \({g}_{z}\) values were relatively small compared with the DLS-estimated values, the penalty terms of \({l}_{2}\)-smoothness regularization did not reduce \({g}_{z}\) to zero. In contrast, most of the \({g}_{z}\) values estimated by our proposed method were almost exactly zero, as the penalty terms in this method produce a piecewise linear velocity structure.

We now focus on spatial variations in the estimated velocities in the horizontal units. The checkerboard anomalies imposed on the true velocity structure and the estimated velocity perturbations, both of which were computed from the baseline velocities at each grid point in Layers 1, 12, 20, and 25, are shown in Fig. 5a. DLS tends to estimate amplitude anomalies that are more than 5% smaller/larger than the assumed true structure in this experiment. Both the \({l}_{2}\)-smoothness regularization and proposed method reproduced the checkerboard pattern in the shallower layers (Layers 0–10). However, \({l}_{2}\)-smoothness regularization failed to reproduce the assumed checkerboard pattern in the deeper layers (Layers 18–25), whereas the proposed method successfully restored the true structure in most areas (see “Layer 20” and “Layer 25” in Fig. 5a). These results suggest that we can also improve the estimated accuracy about the horizontal-direction variations by grasping the vertical-direction structural changes via the sparse regularization term. Note that, as the spatial locations of hypocenters and stations are non-uniform in the target region (Fig. 2), the number of ray paths differs according to location (e.g., there are relatively few hypocenters in the south part of the target region). Nevertheless, the proposed method succeeded to recover the true structure from the spatially biased data.

Fig. 5
figure 5

a Anomalies of the estimated velocities from the baseline velocities in several layers for the estimation methods. Dots represent grid points. b Locations of relocated hypocenters in map-view (left) and cross-sectional view along a north–south profile (right). Grey and red circles represent the true (initial) and relocated hypocenters, respectively. East, north, and down directions are positive. The origin of the coordinates is \({35.25}^{\circ }\) N, \({138.25}^{\circ }\) E, and 0 km depth

Figure 5b illustrates the initial and relocated hypocenters when using our proposed velocity estimation as the velocity estimation. The mean, median, and standard deviation of the estimated errors were 1.48, 1.25, and 1.15 km, respectively. When using the conventional methods for velocity estimation, we obtained similar relocating results: the means of estimated errors using the DLS and \({l}_{2}\)-smoothness regularization methods were 1.63 and 1.49 km, respectively. These results suggest that our velocity estimation with structured regularization does not influence hypocenter determination. In addition, we compared the proposed method with other regularization methods (Laplacian regularization and other sparse regularization methods via \({l}_{1}\)-norm) through several experiments, as detailed in the Appendix.


Size of the velocity jump at a discontinuity

We examined the sensitivity of the three estimation methods (DLS, \({l}_{2}\)-smoothness regularization, and our proposed method) to the amplitude of a velocity jump in the vertical direction. It is expected that estimation accuracy of the velocity parameters will deteriorate as the size of the velocity jump becomes larger. The initial value of velocity parameter in each grid point was the same as the main experiment in the previous section (uniform velocity of 4.0 km/s). Figure 6 shows the averages of the true and estimated velocities at each layer, and the MAEs for each tested velocity jump. The results of this sensitivity test are shown in Figs. 6a–c and 4 (the case for which the size of the velocity jump is 1.0 km/s). All methods yielded comparable estimation accuracies when there was no velocity jump (constant velocity with depth; Fig. 6a). We also found that the performance of the DLS method degraded gradually as the size of the velocity jump increased. \({l}_{2}\)-smoothness regularization performed better than DLS based on the MAEs, but it failed to reproduce the linear trend in Layers 13–25 (Fig. 6b, c).

Fig. 6
figure 6

Recovery of the estimated average velocities by changing the size of the velocity jump: a 0.0 km/s; b 0.2 km/s; and c 0.5 km/s. d MAE variations for each of the tested velocity jumps and methods (DLS, \({l}_{2}\)-smoothness regularization (L2), and the proposed methods). Colored lines in ac are the same as those in Fig. 4

This occurs, because the penalty term of the \({l}_{2}\)-smoothness regularization does not strictly hold the average velocity gradient constant, as it is composed of the \({l}_{2}\)-sum of the \({l}_{2}\)-norm (\({g}_{z}\); Eq. (4)). In contrast, the decrease in precision of the proposed method is relatively suppressed, especially in the layers below the velocity jump (Layers 13–25), by expressing the piecewise linear trend of the true velocity structure via the penalty term, which consists of the \({l}_{1}\)-sum of the \({l}_{2}\)-norm (sum of the square roots of \({g}_{z}\)). We confirmed that the proposed method estimated velocity parameters more stably for each of the tested velocity jumps compared with the conventional methods (Fig. 6d). These results suggest that the proposed method can recover a range of velocity changes (small to large amplitudes) that may be associated with discontinuities.

Initial model dependence

We conducted additional experiments to investigate the influence of the initial model on the estimated velocity structure for each of the estimation method, which may be due to the nonlinearity of the objective function. The main experiment adopted an initial velocity structure of 4.0 km/s at all of the grid points (Fig. 4); our additional experiments tested initial velocities of 4.5 and 5.0 km/s. We configured all of the other settings to be the same as those in the main experiment. The averages of the true and estimated velocities in each layer, and the associated MAEs for different initial velocities are shown in Fig. 7. Note that the proposed method yields the smallest MAEs among the estimation methods for each of the three initial velocities (Fig. 7c), indicating that our structured regularizations provide stable estimations of the velocity structure, regardless of the initial velocity model. It is generally more difficult to estimate the velocity parameters in the deeper layers compared with the shallow layers because of the sparsity of the seismic ray paths at depth. The estimated average velocity more closely reproduced the true velocity in all cases when the initial velocity was set to 5.0 km/s, which is close to the true velocity of the deep layers.

Fig. 7
figure 7

Recovery of the estimated average velocities for different initial velocities: a 4.5 km/s and b 5.0 km/s. c MAE variations arising from the initial velocities for the different estimation methods. Colored lines in a, b are the same as those in Fig. 4

The relationship between method accuracy and sample size

We investigated the accuracy of each method for different sample sizes (the number of arrival time data). The sample size was controlled by either decreasing or increasing the number of available seismic stations that were analyzed to extract the P-wave arrival times. We used the same amplitude of the velocity jump and initial velocity parameters as those in the main experiment. Figure 8 shows the averages of the true and estimated velocities in each layer, and the MAEs for each sample size. We confirmed that the proposed method performed the best in each of the tested settings based on its MAE values (Fig. 8d). The number of velocity parameters was as large as 936 (\(6\times 6\times 26\)) in our experiments, inevitably making it difficult to estimate the velocity structure without regularization considering the spatial information when the sample size was small (Fig. 8a). Although all methods performed well for a large sample size (Fig. 8b, c), the methods with regularizations showed better accuracies than that of the DLS. The \({l}_{2}\)-smoothness regularization and proposed methods yielded relatively stable accuracies, even when the number of arrival time data was small, as regularization methods are generally capable of avoiding overfitting and performing well when there are a lot of parameters to estimate (e.g., Negahban et al. 2012; Hastie et al. 2015). Furthermore, the proposed method reproduced the sharp change in the average velocity structure the best among the estimation methods. Throughout the experiments in the previous and this subsections, \({l}_{2}\)-smoothness regularization showed a tendency to make biased estimates at depths below the change point (Layers 13–25). In contrast, our proposed method provided less biased and more stable estimates.

Fig. 8
figure 8

Recovery of the estimated average velocities by changing the number of arrival time data used in the analysis: a 1250; b 4393; and c 6414, which are about one-half, twice, and three times the number used in the main results, respectively. d MAE variations in the sample sizes for the estimation methods. The colored lines in ac are the same as those in Fig. 4

Dipping interface

We assumed that the structure was composed of horizontal (flat) interfaces in the previous experiments, but the interfaces are not always horizontal for more complex structures in Earth. Here, we conducted an additional numerical experiment assuming a dipping interface at crustal depths. We used the same settings as in the main experiment, but assigned a west–east dipping interface in Layers 8–14, shown in the left of Fig. 9, as a true velocity structure. We configured the true average velocity to be piecewise linear: the velocity gradient of the average velocity was constant in Layers 0–7, 8–14, and 15–25, respectively. The number of the observed arrival time data was 4563. Figure 9 also illustrates the west–east vertical cross sections obtained using the three estimation methods (\({35.43}^{\circ }\) N). The proposed method recorded the best (smallest) MAE of the estimation methods: the MAE values of DLS, \({l}_{2}\)-smoothness regularization, and our proposed methods were 0.333, 0.087, and 0.064, respectively. As demonstrated in the previous section and subsections, the proposed method can enhance the flat discontinuities, but on the basis of this experiment, we confirm that it can also be applied to dipping interfaces.

Fig. 9
figure 9

Vertical and horizontal variations in the true and estimated velocity structures obtained using DLS, \({l}_{2}\)-smoothness regularization (L2), and the proposed method

Application to real seismic data

We applied the proposed method to real seismic data, using seismic waveforms from 211 earthquakes that were observed by the high-sensitivity seismograph network in Japan (Hi-net, National Research Institute for Earth Science and Disaster Resilience 2019) during the 2005–2014 period. We used 2042 P-wave arrival times from the waveforms, and divided the arrival times into 1701 training data and 341 validation data for cross-validation. The target region of this experiment is shown in Fig. 2. We employed the same grid points as those in the synthetic test. We set a velocity of 6.0 km/s at all of the grid points in the study area for the initial velocity model, and fixed the JMA2001 1-D velocity model (Ueno et al. 2002) values, which have been commonly employed for routine hypocenter determinations throughout Japan, at the outer points (outside the target region).

The resultant P-wave velocity–depth profiles for the methods are shown in Fig. 10a–c illustrate the vertical cross-sectional variations. The proposed method estimated a notable change of averaged velocity in the target region, with monotonically increasing averaged velocities to approximately 16 km depth (Layer 16) and a nearly constant velocity at greater depths (red line in Fig. 10a). From Fig. 10, it can be seen that there is a change in velocity gradient at the depth around 16 km (arrows in Fig. 10b, c). The obtained average velocities at depths greater than 16-km depth were approximately 6.71 km/s, coinciding with those determined by the reflection and wide-angle refraction survey (Iidaka et al. 2003). Since the Conrad discontinuity in the target region has been imaged at approximately 15–20 km depth (e.g., Iidaka et al. 2003; Katsumata 2010), we interpreted that the change in average velocity gradient at depths of around 16 km could be related to this discontinuity.

Fig. 10
figure 10

a Estimated average P-wave velocities. Results of the proposed method, which are derived from real observational data, are shown by the red line. The grey line indicates the initial velocity model. Green and blues lines are the estimated P-wave velocity structures determined via the DLS and \({l}_{2}\)-smoothness regularization (L2) methods, respectively. b, c Vertical and horizontal variations in P-wave velocity. Arrows indicate 16 km depth. Cross-sectional locations are indicated in the inset map

The proposed method estimated the eastward dipping interface in the shallower part (Fig. 10b, c). The obtained mean and standard deviation of P-wave velocity in the west of the 0–5 km depth were 5.66 and 0.20 km/s, that is comparable with those determined by Iidaka et al. (2003) of which the survey lines crossed the west of the target area. Meanwhile, obtained mean and standard deviation of P-wave velocity at the east of the 0–5 km depth were 4.83 and 0.54 km/s, respectively. Similar near-surface low-velocity zones has been imaged by Matsubara et al. (2019) around the east of the target area, supporting our results. Since we took the depth–average including the low-velocity region, the obtained average velocity was gradually increased down to the depths of about 16 km and thus the change of average velocity at the deeper potion would become somewhat continuously.

Therefore, the applicability of the proposed method in elucidating sharp velocity discontinuities is validated by its success in detecting the structural change, which is defined by this sudden change in the velocity gradient within the target region. There were large fluctuations in the DLS average velocities (green line in Fig. 10a), whereas the \({l}_{2}\)-smoothness regularization vertical fluctuations were smoothed (blue line in Fig. 10a). The average velocity gradient of the \({l}_{2}\)-smoothness regularization shows some changes in the Layers 13–16 range, as well as the proposed method. However, the P-wave velocity obtained by the \({l}_{2}\)-smoothness regularization method was 6.28 km/s at the depths of greater than 16 km, which was clearly smaller than those retrieved by the reflection and wide-angle refraction survey, 6.6–6.8 km/s (Iidaka et al. 2003), and the proposed method, 6.71 km/s. The small number of arrival time data used here can cause underestimations of average velocities in the \({l}_{2}\)-smoothness regularization method.

The regularization parameters for the proposed methods were \(\left({\lambda }_{\text{ver}}=0.45, {\lambda }_{\text{hor}}=0.95\right)\). RMSE values for each pair of regularization parameters \(\left({\lambda }_{\text{ver}},{\lambda }_{\text{hor}}\right)\) are represented by a heat map (Fig. 11a, b). We also show the estimated average velocities and values of \({g}_{z}\) (Eq. (4)) for each layer for some pairs of \(\left({\lambda }_{\text{ver}},{\lambda }_{\text{hor}}\right)\), in Fig. 11c–e. The RMSE for optimal regularization parameter was 0.17 s. When using values of \({\lambda }_{\text{ver}}\) and \({\lambda }_{\text{hor}}\) that are too small, the corresponding estimation procedure is similar to that of DLS, and thus it becomes difficult to adapt to a sharp change in the velocity structure (Fig. 11c). In contrast, when using values of \({\lambda }_{\text{ver}}\) and \({\lambda }_{\text{hor}}\) that are too large, variations in velocity gradients and adjacent velocity parameters are suppressed excessively, and the resultant velocity structure tends to be too smooth (Fig. 11e).

Fig. 11
figure 11

a Heat map of RMSE value for each pair of regularization parameters \(\left({\lambda }_{\text{ver}},{\lambda }_{\text{hor}}\right)\), in logarithmic scale. b Magnified view of around optimal point, (0.45, 0.95), illustrated by star. The drawing range is shown in a by the rectangle. ce Estimated average velocities of each layer (red) and \({g}_{z}\) values (grey) for three pairs of regularization parameters: c (0.10, 0.10), d (0.45, 0.95), and e (2.00, 2.00). The corresponding points are shown in a by the stars

These results, which are obtained from real seismic data, suggest that the proposed method can stably detect the true depth of the velocity discontinuity, even when the number of available observational data is small. Later reflected and/or converted waves have conventionally been used to investigate the depths of various velocity discontinuities, such as the Conrad and Moho discontinuities, and the subducting plate interface (e.g., Matsuzawa et al. 1986; Zhao et al. 1997). However, there are cases, where such later waves are identified only in a limited number of ray paths of earthquake–station pairs, unlike direct P and S waves that are commonly and widely observed from numerous earthquakes. As mentioned in the previous section, the estimated accuracy of our method improves with increasing sample size, as with conventional methods; thus, available later arrival data will be useful for improving the accuracy of estimation of the proposed method. A significant advantage of the proposed method is that it can estimate velocity structure robustly, even in cases where there is only a small number of data, by employing sparse regularization. The proposed method will improve the detection of velocity discontinuities considerably and refine imaging in regions, where later waves are not widely observed.


We introduced a nonlinear inversion method with structured regularization to image the crustal structure of Earth. Our proposed LET method simultaneously estimated both smooth trends and sharp changes in crustal velocity structure, both of which are expected in Earth's interior, by combining two types of penalty terms that are added to the vertical and horizontal directions of the model space. We employed a vertical-direction penalty term that consisted of the second-order differences in the depth-dependent velocity parameters to detect a velocity discontinuity, thereby highlighting the ability to image sharp velocity changes in the vertical direction. This vertical-direction penalty term works on the depth-averaged velocity values, and takes the form of the \({l}_{1}\)-sum of the \({l}_{2}\)-norm. This penalty enables to reproduce piecewise linear trends in the velocity changes at depth, and image the sharp structural changes. We used a horizontal-direction penalty term that consisted of first-order differences of the velocities that were based on the \({l}_{2}\)-norms. This horizontal-direction penalty smooths velocity fluctuations.

We compared the imaging capability of the proposed method with conventional LET approaches, the damped least-squares and \({l}_{2}\)-based regularization methods, via synthetic data experiments to verify the performance of the proposed method. Accordingly, we confirmed that the proposed method can adequately reproduce both of sharp and gradual velocity changes. We also demonstrated that the proposed method is stable against variations in the amplitude of velocity jump, initial velocity structure, and sample size, and that it has the ability to accommodate dipping structural changes in the crust. Furthermore, we applied the proposed method to real seismic data from central Japan, and successfully imaged a distinct velocity gradient change at approximately 16 km depth. Therefore, the proposed method can improve the detectability of horizontal and dipping interfaces using arrival time data. Our proposed method automatically detects the existence (or nonexistence) of discontinuities, because it does not require prior information regarding the velocity discontinuity. Results of the synthetic tests and the real data analysis highlighted the importance of sparse regularization to better estimate the subsurface velocity structure, and suggested that we can improve the imaging in the framework of the earthquake tomography for existing seismic data by combining appropriately the structured regularizations.

Availability of data and materials

The data sets analysed during the current study are available from the Japan Meteorological Agency (JMA) at



Alternating direction method of multipliers


Damped least-square


High-sensitivity seismograph network


Japan Meteorological Agency


Local earthquake tomography


Mean absolute error


National Research Institute for Earth Science and Disaster Resilience


Root mean square error


Residual sum of squares


  • Aki K, Lee WHK (1976) Determination of three-dimensional velocity anomalies under a seismic array using first P arrival times from local earthquakes: 1. A homogeneous initial model. J Geophys Res 81:4381–4399

    Article  Google Scholar 

  • Alesssandrini B, Filippi L, Borgia A (2001) Upper-crust tomographic structure of the Central Apennines, Italy, from local earthquakes. Tectonophysics 339:479–494

    Article  Google Scholar 

  • Bassin C (2000) The current limits of resolution for surface wave tomography in North America. Eos Trans Am Geophys Union 81:S12A-03

    Google Scholar 

  • Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends (r) Mach Learn 3:1–122

    Article  Google Scholar 

  • Dziewonski AM, Anderson DL (1981) Preliminary reference earth model. Phys Earth Planet Inter 25:297–356

    Article  Google Scholar 

  • Gabay D, Mercier B (1976) A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput Math Appl 2:17–40

    Article  Google Scholar 

  • Gavin H (2013) The Levenberg-Marquardt method for nonlinear least squares curve-fitting problems. Department of Civil & Environmental Engineering, Duke University, Durham

    Google Scholar 

  • Glowinski R, Marroco A (1975) Sur l’approximation, par éléments finis d’ordre un, et la résolution, par pénalisation-dualité d’une classe de problèmes de Dirichlet non linéaires. ESAIM Math Model Numer Anal (Modélisation Mathématique Et Analyse Numérique) 9:41–76

    Google Scholar 

  • Guntuboyina A, Lieu D, Chatterjee S, Sen B (2020) Adaptive risk bounds in univariate total variation denoising and trend filtering. Ann Stat 48:205–229

    Article  Google Scholar 

  • Hastie T, Tibshirani R, Wainwright M (2015) Statistical learning with sparsity: the lasso and generalizations. CRC Press, Boca Raton

    Book  Google Scholar 

  • Hirose T, Nakahara H, Nishimura T, Campillo M (2020) Locating spatial changes of seismic scattering property by sparse modeling of seismic ambient noise cross-correlation functions: application to the 2008 Iwate-Miyagi Nairiku (Mw 6.9), Japan, earthquake. J Geophys Res Solid Earth 125:e2019JB019307

    Article  Google Scholar 

  • Iidaka T, Iwasaki T, Takeda T, Moriya T, Kumakawa I, Kurashimo E, Kawamura T, Yamazaki F, Koike K, Aoki G (2003) Configuration of subducting Philippine Sea plate and crustal structure in the central Japan region. Geophys Res Lett 30:1219

    Article  Google Scholar 

  • Kato A, Iidaka T, Ikuta R, Yoshida Y, Katsumata K, Iwasaki T, Sakai S, Thurber C, Tsumura N, Yamaoka K, Watanabe T, Kunitomo T, Yamazaki F, Okubo M, Suzuki S, Hirata N (2010) Variations of fluid pressure within the subducting oceanic crust and slow earthquakes. Geophys Res Lett 37:L14310

    Article  Google Scholar 

  • Kato A, Sakai S, Matsumoto S, Iio Y (2021) Conjugate faulting and structural complexity on the young fault system associated with the 2000 Tottori earthquake. Commun Earth Environ 2:13

    Article  Google Scholar 

  • Katsumata A (2010) Depth of the Moho discontinuity beneath the Japanese islands estimated by traveltime analysis. J Geophys Res Solid Earth 115:B04303

    Article  Google Scholar 

  • Kennett BLN, Engdahl ER (1991) Traveltimes for global earthquake location and phase identification. Geophys J Int 105:429–465

    Article  Google Scholar 

  • Kim SJ, Koh K, Boyd S, Gorinevsky D (2009) L1 trend filtering. SIAM Rev 51:339–360

    Article  Google Scholar 

  • Klinger Y (2010) Relation between continental strike-slip earthquake segmentation and thickness of the crust. J Geophys Res Solid Earth 115:B07306

    Article  Google Scholar 

  • Lees JM, Crosson RS (1989) Tomographic inversion for three-dimensional velocity structure at Mount St. Helens using earthquake data. J Geophys Res Solid Earth 94:5716–5728

    Article  Google Scholar 

  • Levenberg K (1944) A method for the solution of certain non-linear problems in least squares. Q Appl Math 2:164–168

    Article  Google Scholar 

  • Li D, Harris JM (2018) Full waveform inversion with nonlocal similarity and model-derivative domain adaptive sparsity-promoting regularization. Geophys J Int 215:1841–1864

    Article  Google Scholar 

  • Marquardt DW (1963) An algorithm for least-squares estimation of nonlinear parameters. J Soc Ind Appl Math 11:431–441

    Article  Google Scholar 

  • Matsubara M, Sato H, Uehira K, Mochizuki M, Kanazawa T (2017) Three-dimensional seismic velocity structure beneath Japanese Islands and surroundings based on NIED seismic networks using both inland and offshore events. J Disaster Res 12:844–857

    Article  Google Scholar 

  • Matsubara M, Sato H, Uehira K, Mochizuki M, Kanazawa T, Takahashi N, Suzuki K, Kamiya S (2019) Seismic velocity structure in and around the Japanese Island Arc derived from seismic tomography including NIED MOWLAS Hi-net and S-net data. In: Seismic Waves—Probing Earth System. IntechOpen.

  • Matsuzawa T, Umino N, Hasegawa A, Takagi A (1986) Upper mantle velocity structure estimated from PS-converted wave beneath the north-eastern Japan Arc. Geophys J Int 86:767–787

    Article  Google Scholar 

  • Moran SC, Lees JM, Malone SD (1999) P wave crustal velocity structure in the greater Mount Rainier area from local earthquake tomography. J Geophys Res Solid Earth 104:10775–10786

    Article  Google Scholar 

  • Nakata R, Hino H, Kuwatani T, Yoshioka S, Okada M, Hori T (2017) Discontinuous boundaries of slow slip events beneath the Bungo Channel, southwest Japan. Sci Rep 7:6129

    Article  Google Scholar 

  • National Research Institute for Earth Science and Disaster Resilience (2019) NIED Hi-net. National Research Institute for Earth Science and Disaster Resilience.

  • Negahban SN, Ravikumar P, Wainwright MJ, Yu B (2012) A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers. Stat Sci 27:538–557

    Article  Google Scholar 

  • Nugraha AD, Mori J (2006) Three-dimensional velocity structure in the Bungo Channel and Shikoku area, Japan, and its relationship to low-frequency earthquakes. Geophys Res Lett 33:L24307

    Article  Google Scholar 

  • R Core Team (2020) R: A language and environment for statistical computing.

  • Schmidt M, Niculescu-Mizil A, Murphy K (2007) Learning graphical model structure using L1-regularization paths. AAAI 7:1278–1283

    Google Scholar 

  • Selvin S, Ajay SG, Gowri BG, Sowmya V, Somon KP (2016) L1 trend filter for image denoising. Procedia Comput Sci 93:495–502

    Article  Google Scholar 

  • Thurber CH (1983) Earthquake locations and three-dimensional crustal structure in the Coyote Lake area, central California. J Geophys Res Solid Earth 88:8226–8236

    Article  Google Scholar 

  • Thurber CH (1993) Local earthquake tomography: velocities and Vp/Vs-theory. In: Iyer HM, Hirahara K (eds) Seismic tomography: theory and practice. Chapman and Hall, London

    Google Scholar 

  • Thurber CH, Eberhart-Phillips D (1999) Local earthquake tomography with flexible gridding. Comput Geosci 25:809–818

    Article  Google Scholar 

  • Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodol) 58:267–288

    Google Scholar 

  • Tibshirani RJ (2014) Adaptive piecewise polynomial estimation via trend filtering. Ann Stat 42:285–323

    Article  Google Scholar 

  • Ueno H, Hatakeyama S, Aketagawa T, Funasaki J, Hamada N (2002) Improvement of hypocenter determination procedures in the Japan Meteorological Agency. Q J Seismol 65:123–134

    Google Scholar 

  • Wang YX, Sharpnack J, Smola AJ, Tibshirani RJ (2016) Trend filtering on graphs. J Mach Learn Res 17:3651–3691

    Google Scholar 

  • Wessel P, Smith WHF (1998) New, improved version of Generic Mapping Tools released. Eos Trans Am Geophys Union 79:579

    Article  Google Scholar 

  • Zhang H, Thurber CH (2003) Double-difference tomography: the method and its application to the Hayward fault, California. Bull Seismol Soc Am 93:1875–1889

    Article  Google Scholar 

  • Zhang J, ten Brink US, Toksöz MN (1998) Nonlinear refraction and reflection travel time tomography. J Geophys Res Solid Earth 103:29743–29757

    Article  Google Scholar 

  • Zhang F, Dai R, Liu H (2014) Seismic inversion based on L1-norm misfit function and total variation regularization. J Appl Geophys 109:111–118

    Article  Google Scholar 

  • Zhao D, Hasegawa A, Horiuchi S (1992) Tomographic imaging of P and S wave velocity structure beneath northeastern Japan. J Geophys Res: Solid Earth 97:19909–19928

    Article  Google Scholar 

  • Zhao D, Matsuzawa T, Hasegawa A (1997) Morphology of the subducting slab boundary in the northeastern Japan arc. Phys Earth Planet Inter 102:89–104

    Article  Google Scholar 

Download references


We used data recorded on seismograph networks operated by the Japan Meteorological Agency and the National Research Institute for Earth Science and Disaster Resilience (Hi-net). GMT software package (Wessel and Smith 1998) and R (R Core Team 2020) were used for creating figures. We deeply thank the AE and two reviewers for their valuable comments and suggestions.


This work was supported by JST CREST Grant Number JPMJCR1763, MEXT Grant Number JPJ010217, and JSPS KAKENHI Grant Numbers JP20K19753 and JP21H05205.

Author information

Authors and Affiliations



YY conceptualized this study, supported by SK, KY, and FK. YY and SK carried out the analyses, and SK validated the results. SK drafted the manuscript, supported by KY and TS. TS and AK contributed to the implications of the results. FK and AK supervised this work. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Sumito Kurata.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1.

Additional figures.


Appendix: An optimization with ADMM

Here, we introduce the estimation procedure of our proposed method. The RSS (denoted by \(R\left(v,h\right)\) in this section) and the damping term in the objective function in Eq. (1) depend on hypocentral parameters \(h\) other than velocity parameters \(v\). Thus, we first approximate \(R\left(v,h\right)+D\left(v,h\right)\) by a quadratic form. We then separate the quadratic form into an additive form of two terms depending on only the velocity parameters (hereafter, denoted by \(l\left(v\right)\)) and hypocentral parameters, by utilizing QR decomposition. Thus, the objective function of the proposed method is

$$f\left(v\right) = l\left(v\right)+{\lambda }_{\text{ver}}{\Omega }_{\text{ver}}\left(v\right)+\frac{1}{2}{\lambda }_{\text{hor}}{\Omega }_{\text{hor}}\left(v\right) .$$

Now, we obtain the following optimization problem via the duality principle:

$${\text{m}}{\text{inimize}}_{v,w} \left[\frac{1}{{\lambda }_{\text{ver}}}\left\{l\left(v\right)+\frac{{\lambda }_{\text{hor}}}{2}{\Omega }_{\text{hor}}\left(v\right)\right\}+{\sum }_{z}||{w}_{z}{||}_{2}\right] ,$$
$$\text{subject } \; \text{to} \quad {A}_{z}v-{w}_{z} = 0 \quad \left(z =2, \ldots ,{n}_{z}-1\right) ,$$

where \(||\cdot |{|}_{2}\) represents the \({l}_{2}\)-norm

$$\begin{aligned} w &= {\left({w}_{2}, \ldots ,{w}_{{n}_{z}-1}\right)}^{T} , \;\;\\ {w}_{z} &= {u}_{z-1}-2 {u}_{z}+{u}_{z+1} \; \left(z=2,\ldots ,{n}_{z}-1\right) ,\end{aligned}$$
$$\begin{aligned} {u}_{z} = & \left({v}_{\mathrm{1,1},z},\dots ,{v}_{1,{n}_{y},z},{v}_{\mathrm{2,1},z},\dots ,{v}_{2,{n}_{y},z} , \dots , \right. \\ & \left. {v}_{{n}_{x},1,z},\dots , {v}_{{n}_{x},{n}_{y},z}\right)^{T} \quad \left(z=1,\dots ,{n}_{z}\right) ,\end{aligned} $$

where \({A}_{z}\) is a matrix satisfying \({A}_{z}\hspace{0.17em}v\hspace{0.17em}=\hspace{0.17em}{w}_{z}\) (\(z\hspace{0.17em}=\hspace{0.17em}2\hspace{0.17em},\hspace{0.17em}\dots \hspace{0.17em},\hspace{0.17em}{n}_{z}-1\)), and \({n}_{x}\),\({n}_{y}\), and \({n}_{z}\) are the numbers of grid points in \(x\)-, \(y\)-, and \(z\)-directions, respectively. We then apply the augmented Lagrange multiplier method, and consider the following function:

$$\begin{aligned} {L}_{\eta }\left(v,w,\mathrm{\alpha }\right) = &\frac{1}{{\lambda }_{\text{ver}}}\left\{l\left(v\right) + \frac{{\lambda }_{\text{hor}}}{2}\hspace{0.17em}{\Omega }_{\text{hor}}\left(v\right)\right\} \\ & + {\sum }_{z}||{w}_{2}|{|}_{2}\hspace{0.17em} \\ &+ {\sum }_{z} \left\{{\alpha }_{z}^{T}\left({A}_{z}\hspace{0.17em}v-{w}_{z}\right) \right. \\ &+ \left. \frac{\eta }{2}\hspace{0.17em}|{|A}_{z}\hspace{0.17em}v-{w}_{z}|{|}_{2}^{2}\right\} . \end{aligned} $$

We obtain estimates by applying the ADMM (Glowinski and Marroco 1975; Gabay and Mercier 1976), a widely used algorithm that is well suited for solving distributed convex optimization problems (e.g., Boyd et al. 2011; Li and Harris 2018):

$$\begin{aligned} {v}^{\left(t+1\right)} &= {\text{argmin}}_{v}\hspace{0.17em} {L}_{\eta }\left(v,{w}^{\left(t\right)},{\alpha }^{\left(t\right)}\right) \\ &= {\text{argmin}}_{v}\hspace{0.17em} \left[\frac{1}{{\lambda }_{\text{ver}}} \left\{l\left(v\right) + \frac{{\lambda }_{\text{hor}}}{2}\hspace{0.17em}{\Omega }_{\text{hor}}\left(v\right)\right\} \right. \\ & \left. \qquad + \frac{\eta }{2}\hspace{0.17em}{\sum }_{z}|{|A}_{z}\hspace{0.17em}v-{w}_{z}^{\left(t\right)}+\frac{1}{\eta }\hspace{0.17em}{\alpha }_{z}^{\left(t\right)}{||}_{2}^{2}\right] , \end{aligned} $$
$$\begin{aligned} {w}^{\left(t+1\right)} &= {\mathrm{argmin}}_{w}\hspace{0.17em}{L}_{\eta }\left({v}^{\left(t+1\right)},w,{\mathrm{\alpha }}^{\left(t\right)}\right) \\ &= {\mathrm{argmin}}_{w}\hspace{0.17em}{\sum }_{z}\left\{||{w}_{z}|{|}_{2} \right. + \frac{\eta }{2}\hspace{0.17em}{\sum }_{z} \\ & \left. \qquad ||{A}_{z}\hspace{0.17em}{v}^{\left(t+1\right)} - {w}_{z}+\frac{1}{\eta }\hspace{0.17em}{\alpha }_{z}^{\left(t\right)}|{|}_{2}^{2}\right\} , \end{aligned} $$
$${w}_{z}^{\left(t+1\right)} = {\mathrm{prox}}_{\frac{1}{\eta }||\cdot |{|}_{2}}\left({A}_{z}\hspace{0.17em}{v}^{\left(t+1\right)}+\frac{1}{\eta }\hspace{0.17em}{\alpha }_{z}^{\left(t\right)}\right) ,$$
$$\begin{aligned} & {\alpha }_{z}^{\left(t+1\right)} = {\alpha }_{z}^{\left(t\right)}+\eta \hspace{0.17em}\left({A}_{z}\hspace{0.17em}{v}^{\left(t+1\right)}-{w}_{z}^{\left(t+1\right)}\right) \\ & \quad \left(z\hspace{0.17em}=\hspace{0.17em}2\hspace{0.17em},\hspace{0.17em}\dots \hspace{0.17em},\hspace{0.17em}{n}_{z}-1\right) , \end{aligned} $$

where “prox” is the proximal operator. We note that the minimization function for \(v\) is nonlinear and nonconvex, and that it takes an additive form of the sums of squares via iterative quadratic approximation. Therefore, we obtain the optimal value using the Levenberg–Marquardt method (e.g., Levenberg 1944; Marquardt 1963; Gavin 2013).

Comparison of the proposed method with various regularization methods

In this section, we additionally compare our proposed method with other regularization methods; the Laplacian regularization and two sparse regularization methods. The Laplacian-based regularization has been used in LET (e.g., Lees and Crosson 1989; Zhang et al. 1998; Moran et al. 1999). The penalty term, \(P\left(v\right)\) in Eq. (1), is given as

$$\begin{array}{c}{P}^{\mathrm{Lap}}\left(v\right) = {\lambda }_{\mathrm{Lap}}\hspace{0.17em}{\sum }_{x,y,z}|| \Delta {v}_{x,y,z}|{|}_{2}^{2} ,\end{array}$$

where \({\lambda }_{\mathrm{Lap}}\) is a non-negative regularization parameter. Here, we refer to this Laplacian regularization method as “Lap” for notational simplicity. Moreover, in earthquake tomography, some studies have applied sparse regularization methods via \({l}_{1}\)-norm (e.g., Zhang et al. 2014). As described in the Introduction, \({l}_{1}\)-type regularizations yield sparse estimation, by shrinking less important features (penalized elements) to zero. We here employ a gridwise regularization using \({l}_{1}\)-norm, that penalizes the first-order differences among adjacent grid points. We term this method “L1first” for notational simplicity. In L1first, \(P\left(v\right)\) is given as

$$\begin{array}{c}{P}^{\mathrm{L}1\mathrm{first}}\left(v\right) = {\lambda }_{\mathrm{L}1\mathrm{first}}\hspace{0.17em}{\sum }_{x,y,z} {\sum }_{\left({x}^{{\prime}},{y}^{{\prime}},{z}^{{\prime}}\right)\sim \left(x,y,z\right)}\left|{v}_{x,y,z}-{v}_{{x}^{{\prime}},{y}^{{\prime}},{z}^{{\prime}}}\right| ,\end{array}$$

where \({\lambda }_{\mathrm{L}1\mathrm{first}}\) is a non-negative regularization parameter. We also examine another regularization method that penalizes the second-order differences among adjacent grid points via \({l}_{1}\)-norm, referred to as “L1second”. \(P\left(v\right)\) in L1second is given as

$$\begin{array}{c}{P}^{\mathrm{L}1\mathrm{second}}\left(v\right) = {\lambda }_{\mathrm{L}1\mathrm{second}}\hspace{0.17em}{\sum }_{x,y,z}||{\Delta v}_{x,y,z}|{|}_{1} ,\end{array}$$

where \(||\cdot |{|}_{1}\) represents the \({l}_{1}\)-norm, and \({\lambda }_{\mathrm{L}1\mathrm{second}}\) is a non-negative regularization parameter. The relationship between L1second and Lap is similar to that between the proposed method and \({l}_{2}\)-smoothness regularization; L1second reduces penalized elements (second-order differences among velocity parameters of adjacent grid points) to zero exactly, whereas Lap does not. Note that, Lap, L1first, and L1second employs the same types of penalty terms for both the vertical and horizontal directions, respectively, unlike \({l}_{2}\)-smoothness regularization and our proposed method, which have different types of penalty for the vertical and horizontal directions by taking the characteristics of Earth’s seismic velocity structure into consideration.

Results of the main experiment using Laplacian regularization and sparse regularizations via \({{{l}}}_{1}\)-norm

We verified the accuracies of estimation of the above-mentioned methods in the main experiment in the section of the main text entitled “Numerical experiment”. Additional file 1: Fig. S1 presents the estimated average velocities and horizontal-direction anomalies from the baseline velocities for the estimation methods. The values of MAE of Lap, L1first, and L1second were 0.185, 0.161, and 0.135, respectively. Although the three methods outperformed DLS in terms of MAE (the MAE value of DLS was 0.383), none of them recorded better (smaller) MAE values than those of \({l}_{2}\)-smoothness regularization (0.080) and our proposed method (0.040). From Additional file 1: Fig. S1, we can see that Lap produced estimates far from the true structure at many grid points, and L1first smoothed the checkerboard anomalies too much by reducing the variation in velocity parameters among adjacent grid points. L1second reproduced the checkerboard anomalies better than Lap and L1first, but it estimated opposite polarities of velocity anomalies at some grid points (e.g., Layers 1 and 20 in Additional file 1: Fig. S1).

Synthetic test for a three-dimensional checkerboard pattern

In the main experiment, we produced the checkerboard pattern only in the horizontal direction to verify the performance of the horizontal-direction penalty terms. Here, we assign the checkerboard pattern of velocity perturbations to the uniform velocity structure in both the horizontal and vertical directions. The number of arrival time data was 4,056. In this experiment, we used \(6\times 6\times 24\) grid points, and generated a 3-D checkerboard velocity model from the uniform velocity structure of 4.0 km/s using perturbations of \(\pm 5\)%: we reversed the positive/negative number of the velocity perturbations every grid point in the horizontal direction, and every four grid points (layers) in the vertical direction. Additional file 1: Fig. S2 illustrates the EW depth-profile of the true and recovered velocity structures (\({35.21}^{\circ }\) N). Although some velocity anomalies in deep layers were not well reproduced, the accuracy of estimation of the proposed method showed the best score index in this experiment: the values of MAE for DLS, Lap, L1first, L1second, \({l}_{2}\)-smoothness regularization, and our proposed method were 0.084, 0.075, 0.085, 0.088, 0.071, and 0.070, respectively.

The case for which there are multiple horizontal layers at depth

As the performances of DLS and Lap were worse than those of the other methods, we focus on comparison of L1first, L1second, and \({l}_{2}\)-smoothness regularization (abbreviated as “L2”) with our proposed method, in this and the next subsection.

In this subsection, we describe an experiment assuming the case for which there are multiple horizontal layers at depth. The number of arrival time data was 5025. In this experiment, we assumed that all values of the velocity parameter were uniform in each layer. The other settings were the same as those in the main experiment. Additional file 1: Fig. S3 shows the true average velocity structure and the results estimated by each method. The MAE values of L1first, L1second, \({l}_{2}\)-smoothness regularization, and our proposed method were 0.042, 0.045, 0.031, and 0.015, respectively. Results of this experiment suggest that the proposed method can closely reproduce the velocity profile containing multiple horizontal layers.

The case for which there is a high-velocity zone in the target region

We conducted an experiment assuming a high velocity anomaly of 5.0 km/s embedded in the homogeneous velocity medium of 4.0 km/s to verify the robustness of estimation methods. The number of arrival time data was 4529. The other settings were the same as the main experiment. The upper-left part of Additional file 1: Fig. S4 shows the vertical and horizontal variations (south–north profile) in true velocity structure (\({138.21}^{\circ }\) E). As the earthquake distribution is biased toward the north part of the target region (also see Fig. 2), many ray-paths pass through the high-velocity zone; in contrast, relatively few ray paths pass through the south part. Additional file 1: Fig. S4 also illustrates the estimated structure by each method. MAE values of L1first, L1second, \({l}_{2}\)-smoothness regularization, and our proposed methods were 0.120, 0.124, 0.115, and 0.088, respectively. Although the accuracy of estimation around the structural boundary in this setting was somewhat worse than in the other experiments due to the biased distribution of ray paths, the proposed method performed the most robustly with respect to the high-velocity anomaly.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Yamanaka, Y., Kurata, S., Yano, K. et al. Structured regularization based velocity structure estimation in local earthquake tomography for the adaptation to velocity discontinuities. Earth Planets Space 74, 43 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Local earthquake tomography
  • Computational seismology
  • Statistical methods
  • Japan