 Full paper
 Open Access
 Published:
3D inversion of magnetic data based on the L1–L2 norm regularization
Earth, Planets and Spacevolume 71, Article number: 73 (2019)
Abstract
Magnetic inversion is one of the popular methods to obtain information about the subsurface structure. However, many of the conventional methods have a serious problem, that is, the linear equations to be solved become illposed, underdetermined, and thus, the uniqueness of the solution is not guaranteed. As a result, several different models fit the observed magnetic data with the same accuracy. To reduce the nonuniqueness of the model, conventional studies introduced regularization method based on the quadratic solution norm. However, these regularization methods impose a certain level of smoothness, and as the result, the resultant model is likely to be blurred. To obtain a focused magnetic model, I introduce L1 norm regularization. As is widely known, L1 norm regularization promotes sparseness of the model. So, it is expected that, the resulting model is constructed only with the features truly required to reconstruct data and, as a result, a simple and focused model is obtained. However, by using L1 norm regularization solely, an excessively concentrated model is obtained due to the nature of the L1 norm regularization and a lack of linear independence of the magnetic equations. To overcome this problem, I use a combination of L1 and L2 norm regularization. To choose a feasible regularization parameter, I introduce a regularization parameter selection method based on the Lcurve criterion with fixing the mixing ratio of L1 and L2 norm regularization. This inversion method is applied to a real magnetic anomaly data observed on Hokkaido Island, northern Japan and reveals the subsurface magnetic structure on this area.
Introduction
The inversion of geomagnetic field data has been considered by many studies that aim to determine the property and geometry of subsurface magnetic structures. One of the major approaches is magnetic property inversion, which automatically retrieves the distribution of subsurface magnetization or magnetic susceptibility from observed magnetic data. In these studies, the subsurface space is divided into a number of small grid cells assuming the susceptibility of each grid cell is homogeneous. In this situation, the equation to be solved becomes linear and the susceptibility of each cell is obtained by the inversion minimizing a specific model objective function.
The main problem of this approach is the ambiguity of the solution caused by the inherent nonuniqueness of the potential field. Furthermore, in the case of the 3D magnetic inverse problem, this ambiguity is emphasized because the problem is illposed in most cases. Accordingly, it is possible for several different models to fit the observed magnetic data with the same accuracy.
One promising mathematical approach to overcome this difficulty is to use an appropriate regularization method. Regularization is, simply speaking, a method to restrict the model space in which we seek the solution to a subspace of a specific class of models that have designated characteristics.
For magnetic and the other geophysical inverse problems, one of the traditional regularization methods is Tikhonov regularization (Tikhonov and Arsenin 1977). In this method, to reduce the ambiguity of the problem and to stabilize the solution, a quadratic penalty related to the solution norm is introduced into the objective function. Li and Oldenburg (1996) used L2 and firstorder spatial differentiation norm as the penalty. Pilkington (1997) introduced a (depth weighted) L2 norm penalty for magnetic inversion. In the electromagnetic studies, for example, Minami et al. (2018) introduced a smoothness penalty into the 3D resistivity inversion and succeeded in detecting the temporal change of the subsurface resistivity structure related to the volcanic activities. Furthermore, Tikhonov regularized inverse problems have been often solved within the framework of the Bayesian approach in the field of the magnetic (e.g., Zeyen and Pous 1991; Tsunakawa et al. 2010; Honsho et al. 2012) and electromagnetic studies (e.g., Ogawa and Uchida 1996). These studies show that the stability and the robustness of the solution are improved. However, because the quadratic solution norm penalty imposes a certain level of smoothness on the model (Hansen 1992), the obtained model tends to be blurred and unfocused. Especially in the case of the magnetic inversion, such blurred feature sometimes makes it look geologically unrealistic.
To obtain a focused model, some studies introduced sparsity regularization which promotes the socalled sparseness into the model. Last and Kubik (1983) introduced a minimumsupport penalty into gravity inversion to recover sharp boundaries of the subsurface density structure. Portniaguine and Zhdanov (1999, 2002), and Zhdanov et al. (2004) proposed 3D focusing magnetic data and gravity gradiometry data inversion, introducing a minimum support, or minimum gradient support penalty. Zhdanov et al. (2000, 2000), Zhang et al. (2015), and Xiang et al. (2017) also introduced minimum gradient support into electromagnetic inversion to recover a resistivity structure with sharp boundaries. Pilkington (2009) proposed to use a Cauchy norm penalty (Sacchi and Ulrych 1995) for 3D magnetic inversion. A Cauchy norm penalty has also been used in some recent 3D magnetic and gravity inversion studies (e.g., Uieda and Barbosa 2012; Abedi et al. 2015; Wang et al. 2015; Rezaie et al. 2016). These penalties realize sparseness of the model by minimizing the number of nonzero components of the model, or the gradient of the model. In simulation studies with synthetic data, these studies demonstrated that a model very close to the true model can be reconstructed and showed the effectiveness of sparse regularization for the potential inverse problem. This fact means that the sparse regularization reduced the ambiguity due to the nonuniqueness of the potential field and the illposedness of the problem and provided a focused model with high resolution.
For the sparse regularization, Tibshirani (1996) proposed an L1 norm regularization method named LASSO (least absolute shrinkage and selection operator) in a statistical study. L1 norm regularization minimizes an objective function which contains a penalty based on the L1 norm of the solution vector. This regularization method is known to have a tendency to choose a sparse model and has, therefore, been immensely popular in various research fields in recent years. In geophysical research, L1 norm regularization has also been used, for example, in seismic tomography studies (e.g., Loris et al. 2007; Wang 2011; Fang and Zhang 2014; Liu et al. 2015).
The basic idea of sparse regularization is L0 norm minimization which is to limit the number of nonzero model elements to a minimum. However, the L0 norm problem is not convex, and it is known that, this problem is NPhard, and no trivial method to solve this problem efficiently has been found. Therefore, in general, the L0 norm problem is replaced by an alternative convex problem by relaxing the constraints of the solution. The regularized inversion using minimum support and Cauchy norm is one of these alternative convex problems, and L1 norm regularization is also one of such alternative problems.
In this paper, a new sparse magnetic inversion method based on L1 norm regularization is proposed. However, for this purpose, we have to address some specific problems of magnetic data inversion. One problem is the lack of depth resolution due to the rapid decay of the magnetic field. As a consequence, an unrealistic model which is excessively concentrated in the shallow region is likely to be provided by magnetic inversion. To tackle this problem, an appropriate weighting function which counteracts the field decay has to be introduced. In conventional studies, Li and Oldenburg (1996) proposed a depth weighting function and Li and Oldenburg (2000) used a sensitivitybased weighting function. On the other hand, Tibshirani (1996) dealt with a normalized regression problem that is equivalent to using the square of the sensitivitybased weighting function of Li and Oldenburg (2000). In this paper, some synthetic tests are performed and the most suitable weighting function for L1 norm regularized magnetic inversion is discussed.
Another problem is that, magnetic data inversion is an underdetermined problem in most cases, that is, the number of observations (N) is much lower than the number of unknown model parameters (M). In such an \(N<<M\) problem, it is known that L1 norm regularization has some critical drawbacks and these drawbacks lead to an overly sparse solution (Zou and Hastie 2005). To overcome this problem, this paper proposes a regularization method with an L1 and L2 norm combined penalty, which is the same as the “Elastic Net” proposed by Zou and Hastie (2005). By this modification, however, we have to introduce two regularization parameters for the L1 and L2 penalty, and the method of choosing feasible regularization parameters becomes a crucial problem. Therefore, this paper also proposes a regularization parameter selection method based on an Lcurve criterion with fixing the mixing ratio of L1 and L2 norm regularization, a priori. In this paper, the effectiveness of the proposed inversion method as well as the parameter selection methods are discussed through some synthetic tests and real field data.
Observation equations
By assuming that there is no remanent magnetization, and induced magnetization is dominant in our study area, magnetization distribution in the volume V is written as
where \(\varvec{H}_0\) is the Earth’s geomagnetic field, and the vector \(\varvec{l}\) is an unitvector parallel to \(\varvec{H}_0\). \(\kappa (\varvec{r}')\) and \(\beta ^*(\varvec{r}')\) are the susceptibility and the intensity of the induced magnetization on \(\varvec{r}'\in V\), respectively. The total magnetic field F resulting from this induced magnetization can be written as a Fredholm integral equation:
where \(\mu _0\) is the magnetic permeability of vacuum, \(\varvec{r}\not \in V\) is the observation point, and \(\nabla _{\varvec{r}}\) and \(\nabla _{\varvec{r}'}\) are the gradient operators with respect to \(\varvec{r}\) and \(\varvec{r}'\), respectively. \(\varvec{r}\varvec{r}'\) is the Euclidean distance between \(\varvec{r}\) and \(\varvec{r}'\):
To solve the integral equation of Eq. (1) numerically, V is divided into a 3D grid of rectangular block cells \((\Delta V_1,\ldots ,\Delta V_M)\). Now, let us denote the magnetic total field produced by a grid cell \(\Delta V_j\) with unit induced magnetization by
Supposing the susceptibility in each grid cell is constant, the magnetization also becomes constant, that is, \(\beta ^*(\varvec{r}'_j)=\beta _j^*\) for \(\varvec{r}'_j\in \Delta V_j\). Then, Eq. (1) is rewritten as
When we have a data set of a magnetic anomaly observed on \((\varvec{r}_1,\ldots ,\varvec{r}_N)\), Eq. (1) can be discretized as
This equation can be rewritten in vectormatrix form:
where \(f_i=F(\varvec{r}_i)\), and the (i, j)th element of matrix \(\varvec{K}\) is \(K_{ij}=K_j(\varvec{r}_i)\) where the explicit form of \(K_{ij}\) is provided by Bhattacharyya (1964). The jth column vector of \(\varvec{K}\), \(\varvec{k}_j\), is the total field over the discrete observation points \((\varvec{r}_1,\ldots ,\varvec{r}_N)\) produced by \(\Delta V_j\) with unit induced magnetization.
In order to obtain magnetic structure with high resolution, it is necessary to finely subdivide V in the lateral directions as well as the depth direction. Consequently, in most cases, the number of grid cells M exceeds the number of the observation points N and the magnetic inverse problem of Eq. (3) becomes an underdetermined and illposed problem, which means a unique solution does not exist. A conventional way to solve such an illposed problem is to rely on regularization methods. In a regularization method, the problem to be solved is replaced by the minimization of the following objective function:
where \(P(\varvec{\beta }^*)\) is a penalty function and the constant \(\lambda\) \((>0)\) is a regularization parameter that controls the strength of the penalty. The explicit form of P differs according to each regularization method, and each method provides different qualities in the solution.
Further, in conventional studies of magnetic inversion, a weighting procedure is commonly introduced into the penalty. Because the amplitude of the total field \(\varvec{k}_j\) produced by the deeper cells decays rapidly, it is not sensitive to the data. Thus, the resultant model tends to concentrate strongly on a very shallow region. In order to compensate for this tendency, we have to introduce a weighting in the penalty to counteract the magnetic field decay:
where \(\varvec{W}\) is a diagonal matrix
and its diagonal elements \(w_j\) are the weighting functions. This problem is equivalent to minimizing the objective function:
where \(\varvec{X}=\varvec{K}\varvec{W}^{1}\) and \(\varvec{\beta }=\varvec{W}\varvec{\beta }^*\). The optimal \(\hat{\varvec{\beta }}^*\) is obtained by
where \(\hat{\varvec{\beta }}\) is a solution which minimizes Eq. (7). Finally, the susceptibility model is obtained by
Weighting function
For the weighting function of Eq. (6), Li and Oldenburg (2000) introduced sensitivitybased weighting defining the integrated sensitivity of matrix \(\varvec{K}\):
They introduced the following sensitivitybased weighting function to reduce the disparity in the sensitivities of each column of \(\varvec{K}\):
Li and Oldenburg (2000) and Portniaguine and Zhdanov (2002) used this function with \(\gamma =1\) as the weighting function:
Alternatively, LASSO proposed by Tibshirani (1996) deals with a normalized regression problem, that is, each column vector \(\varvec{k}_j\) is assumed to be normalized. Obviously, this data setting is equivalent to using the weighting of Eq. (8) with \(\gamma =2\):
In the later section, some synthetic tests are performed and we discuss the performance of the above weighting functions.
L1 norm regularization
L1 norm regularization minimizes an objective function \(\mathcal {L}\) which involves an L1 norm penalty of the solution vector \(\varvec{\beta }\):
By introducing the L1 norm penalty, sparseness of the model is promoted. To see how the L1 norm regularization introduces the sparseness into the model, consider the following simplified singlevariable problem. Suppose that the matrix \(\varvec{X}\) has only one column, that is, \(\varvec{X}\) is a column vector \(\varvec{x}\) and \(\varvec{\beta }\) is a scalar \(\beta\). In this case, Eq. (11) becomes
If \(\beta >0\), we can differentiate Eq. (12) to get
where a superscript T indicates the transpose. Thus, the \(\beta\) that minimizes Eq. (12) is obtained as
However, because we are now considering the case of \(\beta >0\), this yields the following result:
By similar calculation in the case of \(\beta < 0\), we can obtain the following integrated result:
where \(\mathcal {S}\) is the following softthresholding operator:
The plot of Eq. (13) is shown in Fig. 1. The solid line shows \(\hat{\beta }_{\lambda }\) of Eq. (13). The dashed line indicates \(\hat{\beta }_0=\varvec{x}^T\varvec{f}/\varvec{x}^T\varvec{x}\) which is Eq. (13) with \(\lambda =0\), and this is a nonregularized, leastsquares solution of Eq. (12). As can be seen in this figure, \(\hat{\beta }_{\lambda }\) is a solution that added a bias of \(\pm \,\lambda /\varvec{x}^T\varvec{x}\) to \(\hat{\beta }_0\), and shrinks small \(\hat{\beta }_0\) that corresponds to \(\varvec{x}^T\varvec{f}<\lambda\) to be exactly 0. By this behavior of the L1 norm regularization, small model elements, that tend to have only a weak contribution to the reproduction of the data, are likely to be shrunk to zero. Consequently, the sparse nature of the model is promoted, and the resultant model is constructed with only truly relevant model elements.
Coordinate descent algorithm for an L1regularized problem
In the previous subsection, we see that a singlevariable problem of Eq. (12) can be solved analytically. However, the multiplevariable problem of Eq. (11) cannot be solved directly and we have to solve it iteratively.
To solve an L1 norm regularized problem iteratively, Friedman et al. (2007) proposed the coordinate descent algorithm (CDA). As is described in what follows, CDA is a simple algorithm and is very easy to implement. Further, CDA can work on very large data sets (Friedman et al. 2010), so the CDA is used in this paper for magnetic inversion as it is also a very largescale problem.
CDA iteratively searches for an optimal solution that minimizes the objective function through a sequence of onedimensional optimizations. When \(\beta _j\not = 0\), we can differentiate \(\mathcal {L}\) of Eq. (11) with respect to \(\beta _j\) and obtain the following stationary point condition:
where
and \(\varvec{\beta }_{j}\) is a vector obtained from \(\varvec{\beta }\) by the replacement \(\beta _j=0\), that is,
and vector \(\varvec{e}_j\) is the jth basis column vector.
Suppose we have obtained a solution \(\hat{\varvec{\beta }}^{(k)}\) by the kth iteration of CDA. On the next iteration, \(\hat{\beta }_j^{(k)}\) is updated as following from Eq. (15):
where \(\hat{\varvec{r}}_{j}^{(k)}\) represents the “partial residuals” with respect to the jth cell:
Using the same calculation that led to Eq. (13), we can easily obtain the result
CDA updates \(\hat{\beta }_j\) for all \(j=(1,2,\ldots ,M)\) and repeats this cycle iteratively until \(\hat{\varvec{\beta }}\) converges. The iteration is stopped when \(\hat{\varvec{\beta }}^{(k+1)}\hat{\varvec{\beta }}^{(k)} /\hat{\varvec{\beta }}^{(k)}<\epsilon\), where \(\epsilon =10^{5}\) is used in this paper.
In the case of the largescale problem such as 3D inversion, it sometimes become a problem how to store a large kernel matrix in computer memory. However, the update equation of Eq. (16) only consists of vector–vector products \(\mathbf {x}_j^T\mathbf {x}_j\) and \(\mathbf {x}_j^T\mathbf {r}_{j}\). To calculate \(\varvec{r}^{(k+1)}_{j}\), we need to know the residuals \(\varvec{r}^{(k+1)}=\varvec{y}\varvec{X}\varvec{\beta }^{(k+1)}\) which contains a multiplication of matrix \(\varvec{X}\) and vector \(\varvec{\beta }\). While, because \(\beta _j^{(k+1)}\) is obtained by Eq. (16) in sequentially, residuals can be also updated by the following:
Consequently, to update the model by CDA iteration, it is not required to store the full \(\varvec{X}\) all at once, and we can save the computer memory required for the calculation.
Friedman et al. (2010) suggested that to obtain an optimal solution for a specified regularization parameter by CDA, it is computationally efficient to iteratively compute the solutions for a decreasing sequence \(\varvec{\lambda }\) which down to a specified value, on the log scale. First, start with large \(\lambda\) and calculate a solution by CDA until convergence. Next, decrease \(\lambda\) and run CDA until convergence using the previous solution as an initial guess, and continue this procedure until \(\lambda\) decreases to a specific value. This scheme is referred to as CDA with warmstart. When \(\lambda\) is very large, all nonzero model elements are shrunk to zero by the softthresholding operator of Eq. (14), and as the results, \(\hat{\varvec{\beta }}=\varvec{0}\) is leaded. The minimum \(\lambda\) which leads to \(\hat{\varvec{\beta }}=\varvec{0}\) is as following (Friedman et al. 2010):
By using a decreasing sequence \(\varvec{\lambda }\) starting from \(\lambda ^{\max }\), it is not necessary to consider the initial guess of \(\varvec{\beta }\) because \(\hat{\varvec{\beta }}\) is always \(\varvec{0}\) for \(\lambda ^{\max }\).
About the convergence of CDA, Tseng (2001) studied the following objective function:
He showed that if \(G(\varvec{\beta })\) is differentiable and convex, and the penalty \(P(\varvec{\beta })\) is separable, that is, it can be represented by a sum of functions of each individual parameter, such as \(P(\varvec{\beta })=\sum _j p_j(\beta _j)\), and each \(p_j(\beta _j)\) is convex even if it is not smooth, CDA reaches its optimal solution. In the case of Eq. (11), \(G(\varvec{\beta })=\varvec{y}\varvec{X}\varvec{\beta }^2/2\) and this is differentiable and convex, and the penalty \(P(\varvec{\beta })=\sum _j\beta _j\) is separable and each \(p_j(\beta _j)=\beta _j\) is convex while it is not smooth at \(\beta _j=0\). Thus, it is guaranteed that CDA provides the optimal solution.
Regularization parameter selection
Because model features will change according to the regularization parameter \(\lambda\), we have to determine an optimum \(\hat{\lambda }\) in some way. To choose a suitable regularization parameter, the Lcurve criterion (Hansen 2001) is widely used.
An Lcurve is a log–log plot of the penalty term of the regularized solution norm on the ordinate and residual norm on the abscissa. This plot has a characteristic shape like a letter ‘L’, so it is referred to as an Lcurve. Obviously, these terms are a decreasing and increasing function of the regularization parameter, respectively. A large regularization parameter results in a small solution norm and a small penalty term, and a large residual norm. In this case, the residual norm is very sensitive to the change of the regularization parameter while the penalty term is almost constant. Conversely, a small regularization parameter results in a large penalty term and a small residual norm, and a small change of the regularization parameter caused a large change in the penalty term while change of the residual norm is very small. These points are plotted on the horizontal and vertical branches of the Lcurve, respectively. The point on the corner on which the curvature reaches a maximum gives the best balance between residual norm and penalty term, and the regularization parameter corresponding to this point achieves the best tradeoff between minimizing residuals and minimizing model complexity.
Because CDA with warmstart provides the solutions for the sequential regularization parameters, the discrete Lcurve is obtained collaterally, and \(\hat{\lambda }\) that maximizes the curvature of the Lcurve is determined by the aid of cubic splines.
L1–L2 norm regularized magnetic inversion
Combination of L1 and L2 norm regularization
In this section, we consider a synthetic 3D magnetic model as shown in Fig. 2, which consists of three magnetized blocks. The model region is 1 km northsouth and, 1 km eastwest with depth of up to 0.5 km from the ground (\(z=0\,\hbox {km}\)), and this region is divided into \(80 \times 80 \times 40\) regular grid cells. The dimensions of two shallow blocks are \(75 \times 75 \times 75\,\hbox {m}\,^3\) which are centered on (\({}\) 0.25 km, 0 km, \({}\) 0.075 km), (0.25 km, 0 km, \({}\) 0.075 km), respectively, and a deep block has a dimension of \(100 \times 100 \times 100\,\hbox {m}\,^3\) centered on (0 km, 0 km, \({}\) 0.25 km). The perspective view of this model is displayed in Fig. 2a. The directions of the ambient geomagnetic field are assumed to be \({I}=50^{\circ }\) and \({D}=7^{\circ }\), and the induced magnetization of these three blocks are assumed as 2 A/m. The magnetic total field anomaly was computed over \(80 \times 80\) observation points at an altitude of 50 m above the surface. Figure 2b shows the synthetic anomaly, which is contaminated with uncorrelated Gaussian noise with zero mean and a standard deviation of \(\sigma _0 = 1.0\,\hbox {nT}\) which is about 2% of the anomaly magnitude. The distribution of noise is displayed in Fig. 2c, and Fig. 3 shows a crosssection of this model through the \(x=0\) km profile.
Using this anomaly as an input, the optimal model was obtained by L1 norm regularized inversion with CDA. The CDA is applied for a decreasing sequence \(\varvec{\lambda }\) in the range of \(\lambda ^{\max }=10^3\) to \(\lambda ^{\min }=10^{1}\) with an interval of \(\log _{10}(\Delta \lambda )=0.1\) on the log scale. By the Lcurve method, \(\hat{\lambda }\) was estimated as 5.2 (Fig. 4).
Figure 5a shows the obtained model. In this calculation, a conventional weighting function \(w^{S_1}\) was used. This figure shows that estimated causative bodies are excessively concentrated and the actual magnetization and dimensions of the blocks are not well represented.
As described in the previous section, it is known that L1 norm regularization has some critical drawbacks in the case where the number of model parameters (M) greatly exceeds the number of observed data (N), which is the situation commonly seen in 3D magnetic inversion.
One major drawback is that, the number of nonzero elements of \(\hat{\varvec{\beta }}\) obtained by L1 norm regularization cannot exceed the number of observations N. For the theoretical explanation of this feature, please see (Tibshirani 2013) and references therein. Another major drawback of L1 norm regularization is that, if \(N<< M\) and some of the columns of \(\varvec{X}\) are highly correlated with each other, L1 norm regularization provides an extremely concentrated model (Fan and Li 2001). As described in the previous section, \(\varvec{x}_j\) stores the (weighted) magnetic anomaly produced by a grid cell \(\Delta V_j\) with unit induced magnetization. Thus, as the subsurface space is divided into very fine grid cells, the pattern of \(\varvec{x}_j\) is similar to that of the magnetic anomaly produced by the neighborhood cells, and \(\varvec{x}_j\) tends to be highly correlated with its neighborhood columns. As a result of these drawbacks, an overly sparse nature is promoted in the case of magnetic inversion and an excessively concentrated model is provided, as shown in Fig. 5a.
This excessively concentrated feature can be also seen in the other sparse magnetic inversion which is based on the alternative of the L0 norm regularization same as the L1 norm regularization. Actually, the result in Figure 1e of Portniaguine and Zhdanov (1999) obtained by minimumsupport inversion is very similar with that in Fig. 5a. To avoid this problem, they used upper (and lower) limits for the model parameters. By introducing the bound constraints, they showed that, if an appropriate bound constraint is set, the minimumsupport inversion can reproduce the true model very well.
In the case of CDA, we can also introduce the bound constraint. When the upper limit \(\hat{\beta }_j\le \beta _{\max }\) is applied, we can easily see that, because Eq. (11) is convex, \(\hat{\beta }_j^{(k+1)}\) of Eq. (16) is replaced by
and as the same manner, when lower limit \(\beta _{\min }\le \hat{\beta }_j\) is applied, \(\hat{\beta }_j^{(k+1)}\) is replaced by
Figure 5b shows the result of the L1 norm regularization with the bound constraint \(\beta _j\le 2\). From this figure, we can see that the resultant model represents the true model very well.
On the other hand, Fig. 5c shows the result with the bound constraint \(\beta _j\le 4\), which is twice the true value. From this figure, we can see that, the magnetization of the causative bodies was estimated larger (\(\simeq\) 4 A/m), and the size of blocks was estimated to be smaller.
These results suggest that, in the case of the L1 norm regularization with bound constraint, the value of \(\beta _{\max }\) is critical for the shape and magnetization of the derived model. However, when the subsurface magnetic structure of the study area is complex, it will be often difficult to choose an appropriate \(\beta _{\max }\).
Therefore, instead to set the upper limit of the model, I introduce the following combination of L1 and L2 norm penalty to reduce the overly concentrated feature of the L1 norm regularization:
where \(\lambda >0\) is a regularization parameter, and \(0\le \alpha \le 1\) is a hyperparameter which represents the mixing ratio of L1 and L2 norm regularization. This modification is same as the “naive Elastic Net” proposed by Zou and Hastie (2005), and the drawbacks of the L1 norm regularization described above can be reduced theoretically as in what follows.
By Eq. (21), the model objective function of Eq. (11) is replaced by
Obviously when \(\alpha =0\), this functional becomes
and the problem of minimizing Eq. (23) is a Lagrange version of the ordinary Tikhonov regularized problem with order zero:
where
\(\varvec{I}\) is the \(M \times M\) identity matrix, and \(\varvec{0}\) is an Mdimensional null column vector. So, the problem corresponding to Eq. (22) is a Tikhonov problem with an L1 norm constraint:
where \(t_{\alpha \lambda }\) is the threshold corresponding to \(\alpha \lambda\). Now in Eq. (25), the dimension of the “observed data” \(\varvec{b}\) is \(N+M\), which exceeds the number of unknown parameters M. Thus, every element of \(\varvec{\beta }\) can have a nonzero value and the first drawback of LASSO is resolved.
In addition, Zou and Hastie (2005) showed that L1–L2 norm regularization encourages a “grouping effect”, which means that the elements of the solution corresponding to the highly correlated columns of \(\varvec{X}\) have similar estimated values. By this enforcing of the grouping effect, the second drawback is mitigated. For the details of this point, please refer to section 2.3 of Zou and Hastie (2005).
As the result of L1–L2 norm regularization, the effect that the L1 norm regularized solution is overly concentrated is resolved by introducing the L2 norm constraint, and at the same time, the sparse nature of the model is also provided by the L1 norm constraint.
In the next section, some synthetic tests are performed and the effectiveness of the L1–L2 norm regularization for magnetic inversion is discussed. Prior to that, the CDA algorithm and the regularization parameter selection method for L1–L2 norm regularization are briefly described in the following subsections.
Coordinate descent algorithm for L1–L2 norm regularized problem
For the objective function of Eq. (22), the stationary point condition for \(\beta _j\) is
Then, the update equation for \(\hat{\beta }_j^{(k+1)}\) is modified to
In the case of the L1–L2 norm regularized problem of Eq. (22), \(G(\varvec{\beta })\) of Eq. (18) is
and this function is differentiable and its Hessian matrix is \(\varvec{H}=\varvec{X}^T\varvec{X}+\lambda (1\alpha )\varvec{I}\). In general, the product of a real matrix and its transpose \(\varvec{X}^T\varvec{X}\) is positive semidefinite (Cambini and Martein 2009), that is, all its eigenvalues \(e_j\) \((j=1,2,\ldots ,M)\) satisfy \(e_j\ge 0\), where the number of zero eigenvalues is the dimension of the null space of \(\varvec{X}\). So, when L2 norm penalty is applied together with L1 norm penalty, that is, when \(0\le \alpha < 1\), the eigenvalues of \(\varvec{H}\) are \(e_j+\lambda (1\alpha )>0\) for an arbitrary \(\lambda >0\), and \(\varvec{H}\) is positive definite. Thus, \(G(\varvec{\beta })\) of Eq. (22) is strictly convex for any \(\lambda\), and it is guaranteed that CDA provides the optimal solution of Eq. (25) according to Tseng (2001).
Regularization parameter selection for L1–L2 norm regularized problem
In Eq. (22), we have to specify \(\lambda\) and \(\alpha\) for the inversion. Friedman et al. (2010) showed the calculation results of linear regression problems using several small data sets for various \(\lambda\) with fixed \(\alpha\). While they suggested that an optimal \(\lambda\) can be selected by using the crossvalidation method, this method is too costly for a largescale problem such as the magnetic inversion. Therefore, in this paper, the Lcurve method is used to select an optimal \(\lambda\) after giving \(\alpha\) a priori. The Lcurve is plotted for the L2 residual norm versus the penalty:
The \(\hat{\lambda }\) is selected as \(\lambda\) where the curvature of this Lcurve reaches a maximum. However, in this procedure, there remains a problem of what value should be given to \(\alpha\). This point is discussed in the next section.
Synthetic examples
Synthetic test based on 3 blocks model
In this subsection, synthetic tests are performed using the proposed inversion method with L1–L2 norm combined regularization. Through these synthetic tests, an appropriate value of \(\alpha\) is discussed as well as the suitable weighting function.
Figure 6 shows a crosssection on the \(x=0\,\hbox {km}\) profile of the models obtained by the proposed inversion method using Fig. 2b as input data. In this calculation, a conventional weighting function \(\varvec{w}^{S_1}\) of Eq. (9) was used. The value of \(\alpha\) is fixed to (a) 0.0, (b) 1.0, and (c) 0.8, respectively, and the Fig. 8b is the same as that in Fig. 5a. The CDA was performed for a sequence \(\varvec{\lambda }\) decreasing from \(\lambda ^{\max }=10^3\) to \(\lambda ^{\min }=10^{1}\) with an interval of \(\log _{10}(\Delta \lambda )=0.1\) on the log scale. Using the Lcurve method, \(\hat{\lambda }\) were estimated as (a) 17.3, (b) 5.2, and (c) 3.2, respectively.
The model in Fig. 6a with \(\alpha =0.0\) is the result of the conventional Tikhonov regularization. In this model, while there are highmagnetization regions corresponding to the two shallow blocks of the true model, they are strongly blurred, and their shape is different from that of the true model. Further, there is no distinct highmagnetization region corresponding to a deep block. The estimated magnetization of the resultant model is also different from that of the true model and is estimated to be smaller. Because the model of Fig. 6a has blurred features, the volume of the magnetized regions is estimated to be larger, and the magnetization is conversely estimated to be small.
On the other hand, the model in Fig. 6b with \(\alpha =1\), which is an L1 norm regularized case, shows excessively concentrated feature as described in the previous section. While the magnetized region corresponding to the deep block can be recognized, their shape and magnetizations are completely different from that of the true model and failed to reproduce the true model.
Conversely, the model in Fig. 6c with \(\alpha =0.8\) is closer to the true model, and location, magnetization, and shape of the resultant model are comparable to that of the true model. From these results, we can see that, L1–L2 norm regularization with \(\varvec{w}^{S_1}\) improves the reconstructivity of the model when an appropriate \(\alpha\) is selected.
Next, to obtain an optimal \(\alpha\), L1–L2 norm regularized inversion was performed again while varying the value of \(\alpha\), and the following residual norm was calculated for each derived model:
where \(\hat{\varvec{\beta }}_{\alpha }\) is the optimal model with the hyperparameter \(\alpha\), and \(\varvec{\beta }_{\mathrm{true}}\) is the true model shown in Fig. 2. It can be considered that, the \(\hat{\alpha }\) which minimizes \(\Delta\) is an optimum \(\alpha\) which derives the closest model to the true model in terms of the model residuals. However, it is possible that the value of \(\hat{\alpha }\) changes with the amplitude of the noise. Thus, \(\Delta\) was also calculated with changing \(\sigma _0\) as 0.5, 1.0, 5.0, and 10 nT, respectively.
Figure 7 shows the plot of \(\Delta\) for \(\alpha\) in the range of 0.6 to 0.99 with an interval of 0.01. In this figure, solid triangles, circles, squares, and diamonds indicate the case of \(\sigma _0 = 0.5\), 1.0, 5.0, and 10.0 nT, respectively. From this figure, we can observe that, \(\hat{\alpha }^{S_1}\), that is, the optimum \(\alpha\) in which \(\Delta\) takes minimum when \(\varvec{w}^{S_1}\) is used, takes the value around 0.96 regardless of the value of \(\sigma _0\). Figure 8 shows the optimal model derived with \(\hat{\alpha }^{S_1}=0.96\) using \(\varvec{w}^{S_1}\). The \(\hat{\lambda }\) was estimated as 3.07 by the Lcurve method. In this figure, panel (a) shows the perspective view of the model, and (b) and (c) show the recovered anomaly and the histogram of the residuals, respectively. The standard deviations of the residuals are 0.98 nT, and this value is comparable to the true standard deviation of \(\sigma _0 = 1.0\,\hbox {nT}\).
Next, I tried \(\varvec{w}^{S_2}\) of Eq. (10) as the weighting function. Figure 9 shows the results with (a) \(\alpha =0.0\), (b) \(\alpha =1.0\), and (c) \(\alpha =0.8\) and \(\hat{\lambda }\) were estimated as (a) 5.7, (b) 4.2, and (c) 4.0, respectively. Like the model in Fig. 6a, the model of Fig. 9a is also strongly blurred, and a highmagnetization region corresponding to a deep block can not be recognized. Although highmagnetization regions corresponding to the shallow blocks can be seen, they extend to the depth direction strongly unlike the true model, and the difference from the true model is greater compared with that in Fig. 6a. This makes sense, because Li and Oldenburg (2000) pointed out \(\varvec{w}^{S_1}\) is suitable for smooth inversion. Figure 9b is the model with \(\alpha =1.0\), and we can see that, the derived model is excessively concentrated like the model in Fig. 6b and failed to reproduce the true model. On the contrary, the model in Fig. 9c is more closer to the true model as in the case of Fig. 6c.
Figure 10 shows the plot of \(\alpha\) versus \(\Delta\) when \(\varvec{w}^{S_2}\) is used. In this figure, solid triangles, circles, squares, and diamonds indicate the case of \(\sigma _0 = 0.5, 1.0, 5.0\), and 10.0 nT, respectively. From this figure, we can observe that, \(\hat{\alpha }^{S_2}\), which is the optimum \(\alpha\) when \(\varvec{w}^{S_2}\) is used, takes the value around 0.90.
Figure 11 shows the result with \(\hat{\alpha }^{S_2}=0.90\), and panels (a), (b), and (c) show the perspective view of the model, the recovered anomaly, and the histogram of the residuals, respectively. The standard deviations of the residuals is 1.0 nT, which is same as the true standard deviation.
Figure 12 shows the crosssections through the \(x=0\,\hbox {km}\) profile of the model of (a) Fig. 8a, and (b) Fig. 11a, respectively, and the values of \(\Delta\) are also displayed. From this figure, we can see that, the shape, location, and magnetization of the three magnetized blocks of the true model were reproduced well, in either case.
As described before, L1–L2 norm regularization provides sparsity and smoothness in the model, and these contradictory features are provided competitively. Obviously, as \(\alpha\) decreases, the L2 norm regularization is acting strongly, and the resulting model becomes smoother and blurred. On the contrary, when \(\alpha\) gets close to 1, the intensity of the L1 norm regularization increases and the magnetization tends to concentrate to each center of the magnetized region. Thus, Fig. 12 suggests that, in the values around \(\hat{\alpha }^{S_1}=0.96\), and \(\hat{\alpha }^{S_2}=0.90\), a tradeoff between the contradictory features of the L1 and L2 norm regularization is realized, and a model that is appropriately focused and is not excessively concentrated is obtained. Further, at this point, the model norm difference between optimal model and true model reaches a minimum.
Next, let us focus on the performance of \(\varvec{w}^{S_1}\) and \(\varvec{w}^{S_2}\). Comparing the results of Fig. 12a and b in detail, the model in Fig. 12a, which is derived by using \(\varvec{w}^{S_1}\), is slightly blurred. Especially, the depth of the deep block is estimated to be shallow, and magnetization is estimated to be small. From this figure, we can see that, the model in Fig. 12b is closer to the true model than that in Fig. 12a. Indeed, \(\Delta\) of the model in Fig. 12b takes a smaller value (\(\Delta =33.4\)) than that of the model in Fig. 12a (\(\Delta =46.3\)), which means the model in Fig. 12b is closer to the true model.
Figure 13 shows the optimal models with \(\hat{\alpha }^{S_1}=0.96\) and \(\hat{\alpha }^{S_2}=0.90\) for various \(\sigma _0\). Figure 13a–c shows the optimal models of \(\hat{\alpha }^{S_1}=0.96\) with \(\sigma _0\) of (a) 0.5, (b) 5.0, and (c) 10.0 nT, and Fig. 13d–f shows the models of \(\hat{\alpha }^{S_2}=0.90\) with \(\sigma _0\) of (d) 0.5, (e) 5.0, and (f) 10.0 nT, respectively. From Fig. 13a–c, we can see that, the shape of the estimated magnetized blocks becomes blurred and the depth of the deep block is estimated to be shallow as \(\sigma _0\) increasing. On the contrary, in the case of the models of Fig. 13d–f, depth and shape of the magnetized blocks are well reproduced, and they are comparable to the true model regardless of the noise amplitude. These results show that, by using \(\varvec{w}^{S_2}\) as the weighting function, we can obtain an appropriate model robustly regardless of the noise amplitude. From these results, we can see that, \(\varvec{w}^{S_2}\) seems to outperform \(\varvec{w}^{S_1}\), and \(\varvec{w}^{S_2}\) is suitable for the proposed L1–L2 norm regularized magnetic inversion.
Synthetic test based on subducting slab model
In order to further discuss the optimal \(\alpha\) and the performance of \(\varvec{w}^{S_1}\) and \(\varvec{w}^{S_2}\), more synthetic tests were performed.
Figure 14 shows a model of subducting slab, consisting of 13 plates that have induced magnetization with dimension of NS \(175\hbox {m} \times \,\hbox {EW } 75\hbox {m} \times\) and thickness of 25 m. The inclination of this slab is \(45^{\circ }\) to the depth of 2.5 km, increasing to \(60^{\circ }\) in the deeper part. The directions of the ambient geomagnetic field are assumed to be \({I}=75^{\circ }\) and \({D}=25^{\circ }\), and the induced magnetization of this slab is assumed as 2 A/m.
Figure 14a shows a perspective view of this model, and Fig. 14b shows the magnetic anomaly produced by this slab which is contaminated with Gaussian noise with zero mean, and standard deviation \(\sigma _0\) of 5.0 nT, which is about 2% of the maximum amplitude of the anomaly. The histogram of noise is shown in Fig. 14c. Figure 15 shows a crosssection through the \(x=0\,\hbox {km}\) profile.
Figure 16 shows the plots of \(\Delta\) for \(\alpha\) in the range of 0.6 to 0.99 using the weighting function of (a) \(\varvec{w}^{S_1}\) and (b) \(\varvec{w}^{S_2}\). From these results, we can see that, \(\hat{\alpha }^{S_1}\) and \(\hat{\alpha }^{S_2}\) are about (a) 0.96 and (b) 0.90, respectively, as in the case of the threeblock model of the previous subsection. This result suggests that, the values of \(\hat{\alpha }^{S_1}\) and \(\hat{\alpha }^{S_2}\) stably take the same values regardless of the difference in the model.
Figures 17 and 18 show the optimal models derived with \(\hat{\alpha }^{S_1}=0.96\), and \(\hat{\alpha }^{S_2}=0.90\), and \(\hat{\lambda }\) is estimated as 16.7, and 23.1, respectively. In these figures, panel (a) shows the perspective view of the model, and (b) and c) show the recovered anomaly, and the histogram of the residuals, respectively. The standard deviations of the residuals are 4.92 nT (Fig. 17c), and 5.06 nT (Fig. 18c) that are comparable to the true standard deviation of \(\sigma _0=5.0\,\hbox {nT}\). Figure 19 shows the crosssections through the \(x=0\,\hbox {km}\) profile of the models in (a) Fig. 17, and (b) Fig. 18, respectively. The model of Fig. 19a reconstructed the shallow part of the true model. But in the deeper part, the shape of this model is different from that of the true model and failed to reproduce the slope change of the slab. On the contrary, the shape of the model in Fig. 19b is more closer to the true model, and slope change of the slab is well reproduced. Further, \(\Delta\) of the model in Fig. 19b takes a smaller value (\(\Delta =46.4\)) than that of the model in Fig. 19a (\(\Delta =57.4\)), that suggests the model in Fig. 19b is closer to the true model.
From the results described in this section, it is suggested that, in the proposed inversion framework, that is, L1–L2 norm combined regularized inversion with CDA, the intensity of the weights of conventional \(\varvec{w}^{S_1}\) seems to be not enough in the deep part to reproduce the true model, and the weighting function \(\varvec{w}^{S_2}\) seems to outperform \(\varvec{w}^{S_1}\). By introducing the weighting \(\varvec{w}^{S_2}\), the product \(\mathbf {x}_j^T\hat{\mathbf {r}}_{j}^{(k)} =(\mathbf {k}_j^T/\mathbf {k}_j)\hat{\mathbf {r}}_{j}^{(k)}\) indicates the correlation between partial residuals \(\hat{\mathbf {r}}_{j}^{(k)}\) and magnetic anomaly \(\mathbf {k}_j\). Therefore, shrinkage of the softthresholding operator \(\mathcal {S}\) is performed with respect to the correlation between \(\hat{\mathbf {r}}_{j}^{(k)}\) and \(\mathbf {k}_j\), and inversion becomes a correlationbased inversion.
Real data study
In this section, the proposed method was applied to aeromagnetic data observed on the northern part of Hokkaido Island, Japan.
On the central part of Hokkaido Island, the Kamuikotan tectonic belt extends from north to south for a distance of about 300 km. This tectonic belt is interpreted as a Cretaceous subduction complex that formed on the old plate boundary between the Eurasian and KulaIzanagi plates (e.g., Maruyama and Seno 1986).
Figure 20 is a simplified geological map of an area of \(10\,\hbox {km} \times 10\,\hbox {km}\) on the northern part of Hokkaido Island, Japan. This area contains part of the Kamuikotan tectonic belt, which consists mainly of volcanic rocks, volcaniclastic turbidites, and sedimentary rocks. In addition, two intrusive serpentine belts extend in the northsouth direction. Within the Kamuikotan tectonic belt, serpentine blocks are intermittently seen, which is one of the characteristics of this tectonic belt.
Morijiri and Nakagawa (2005) studied the magnetic properties of the serpentine rocks that were sampled from the outcrop of the southern part of the Kamuikotan tectonic belt. They reported that, while these serpentine samples have large and stable remanent magnetization, their directions are randomly scattered. So, since the scattered magnetizations are canceled each other, the mean remanent magnetization of the serpentine rocks in the Kamuikotan tectonic belt is considered to be small. In the area of Fig. 20, the other rocks around the serpentine belts are also expected to have small remanent magnetization because they are mainly sedimentary rocks. Thus, I assumed that the main factor of the magnetization in this area is the induced magnetization.
On the northern part of Hokkaido Island, an aeromagnetic survey was conducted by the Geological Survey of Japan (GSJ) in 1974 for the purpose of oil and natural gas resource evaluation. These data, and other aeromagnetic data acquired by GSJ and New Energy Development Organization of Japan (NEDO) in and around Japan area, had been compiled and published as the “Aeromagnetic Anomalies Database of Japan” by the GSJ (Nakatsuka and Okuma 2005). The data on the area of Fig. 20 that contained in this database are a projection of the IGRF residuals onto a regular grid of \(200\,\hbox {m} \times 200\,\hbox {m}\) at a uniform altitude of 1,524 m above sea level using upward continuation.
In this paper, the shallow magnetic structure of the area of Fig. 20 was investigated by applying the proposed magnetic inversion method. The study area was divided into \(50 \times 50 \times 35\) regular grid cells up to a depth of 3.5 km from the surface. The dimension of each grid cell is \(200 \times 200 \times 100\,{\mathrm {m}}^3\), and its horizontal size is same as the spacing of the data points.
Before performing the inversion, the regional component was removed by applying trend surface analysis (Borcard et al. 1992). The regional anomaly \(\varvec{t}\) was assumed to be expressible by the following linear equations:
where \(\varvec{x}\) and \(\varvec{y}\) are the coordinates of the observation points. The coefficients \(c_0\), \(c_1\), and \(c_2\) were estimated by leastsquares fitting. Figure 21a shows the trend removed anomaly by subtracting \(\varvec{t}\) from the data of our study area. These data were used as the input data of our inversion.
According to Ueda et al. (2012), the magnetic inclination, declination, and intensity of the total field are assumed to be \(I=59.1^\circ\), \(D=N10.6^\circ \hbox {W}\), and 51,000 nT, respectively.
The L1–L2 norm regularized inversion with \(\alpha =0.9\) was applied for a decreasing sequence \(\varvec{\lambda }\) down from \(\lambda ^{\max }=10^3\) to \(\lambda ^{\min }=10^{1}\) with an interval of \(\log _{10}(\Delta \lambda )=0.1\).
Figure 22 shows the Lcurve and its curvature, and from this result, \(\hat{\lambda }\) is estimated as 31.6. Figure 23 shows the slice of the optimal model, and recovered anomaly by this model is shown in Fig. 21b. The depth of each slice of Fig. 23 is (a) 200 m, (b) 400 m, (c) 600 m, and (d) 800 m, respectively. On each figure, the location of the exposed serpentine belts is displayed in black solid lines. As shown in this figure, a focused model was obtained exhibiting two highmagnetization areas extending north to south.
From Fig. 23a, we can see that, these two magnetization areas correspond well to the locations where two serpentine belts are exposed on the surface, and this suggests that the estimation of the subsurface magnetic structure has been successfully conducted.
Next, focusing on the magnetization intensity, the estimated magnetization of this area has a maximum of about 10 A/m. In this area, Okazaki et al. (2011) measured the susceptibility of major rock samples near the surface and reported that the average susceptibility of the serpentine belt is \(20\times 10^{3}\) SI unit. Thus, if geomagnetic intensity is 51,000 nT, the induced magnetization is about 1 A/m, which is 10 times smaller than the estimated value. Therefore, it can be considered that the magnetic susceptibility of the sample near the ground surface was weakened by weathering. Or, while the rock magnetization of this area was assumed to be only the induced magnetization, it is thought that the component of the remanent magnetization is also included.
Discussion
In this paper, a magnetic inversion combining L1 and L2 norm regularization was proposed. However, it is also possible to consider some variants of the penalty. For example, instead of the L2 solution norm, firstorder or secondorder differential solution norms could be used. By using these penalties combined with the L1 norm penalty, the problem is equivalent to a firstorder or secondorder Tikhonov problem with an L1 norm constraint. By introducing a differential norm penalty, the resultant model will have a smoother nature and tends to be more blurred. However, Fedi et al. (2005) suggest that higher order Tikhonov regularization has higher depth resolution than zeroorder Tikhonov regularization in some situations. The evaluation of the properties and validity of these penalty combinations for magnetic inversion is left as work for the future.
In this paper, methods of regularization parameter selection were also proposed that are based on the Lcurve criteria. However, it will be possible to use other regularization parameter selection criteria. For example, one of the major competitors of the Lcurve is generalized crossvalidation (GCV) (Wahba 1990). GCV is also widely used for parameter selection in both L1 and L2 norm regularized problems. While Hansen and O’leary (1993) claimed that the Lcurve outperforms the GCV for the Tikhonov problem, other studies show GCV can select feasible regularization parameters (e.g., Farquharson and Oldenburg 2004). This discrepancy arises from the different conditions in each problem, such as the difference of the dimensions of the problem, the degree of illposedness and the noise level. The estimation of the effectiveness of other criteria will also considered in future work.
Conclusions
In this paper, an inversion method to obtain a 3D magnetic susceptibility distribution has been presented which incorporates a sparseness constraint based on L1 norm regularization. In order to improve the depth resolution of the model, an appropriate weighting function for the L1 norm regularized magnetic inversion has been discussed by synthetic data tests, and it is shown that the proposed square of the sensitivity weighting function outperforms other competitors. However, by applying L1 norm regularization solely, the synthetic test revealed that an excessively concentrated model is likely to be obtained regardless of the dimensions of the true model. To address this problem, a combination of L1–L2 norm regularization has been introduced in this paper. To choose feasible regularization parameters of the L1 and L2 norm penalty, this paper proposed regularization parameter selection methods based on the Lcurve method with fixing the mixing ratio of L1 and L2 norm regularization. The synthetic tests and a real data study showed the effectiveness of the proposed inversion method.
Availability of data and materials
Not applicable.
Abbreviations
 LASSO:

Least absolute shrinkage and selection operator
 CDA:

Coordinate descent algorithm
 GSJ:

Geological survey of Japan
 NEDO:

New energy development organization of Japan
 GCV:

Generalized crossvalidation
References
Abedi M, Siahkoohi H, Gholami A, Norouzi G (2015) 3d inversion of magnetic data through wavelet based regularization method. Int J Min and GeoEng 49:1–18. https://doi.org/10.22059/ijmge.2015.54360
Bhattacharyya BK (1964) Magnetic anomalies due to prismshaped bodies with arbitrary polarization. Geophysics 29:517–531
Borcard D, Legendre P, Drapeau P (1992) Partialling out the spatial component of ecological variation. Ecology 73:1045–1055. https://doi.org/10.2307/1940179
Cambini A, Martein L (2009) Generalized convexity and optimization, theory and applications. Springer, Berlin. https://doi.org/10.1007/9783540708766
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360. https://doi.org/10.1198/016214501753382273
Fang H, Zhang H (2014) Waveletbased doubledifference seismic tomography with sparsity regularization. Geophys J Int 199:944–955
Farquharson CG, Oldenburg DW (2004) A comparison of automatic techniques for estimating the regularization parameter in nonlinear inverse problems. Geophys J Int 156:411–425
Fedi M, Hansen PC, Paoletti V (2005) Analysis of depth resolution in potentialfield inversion. Geophysics 70:1–11. https://doi.org/10.1190/1.2122408
Friedman J, Hastie T, Hofling H, Tibshirani R (2007) Pathwise coordinate optimization. Ann Appl Stat 1:302–332. https://doi.org/10.1214/07AOAS131
Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Soft 33:1–22
Hansen PC (2001) In: Johnstion P (ed) The Lcurve and its use in the numerical treatment of inverse problems. WIT Press, Southamptopn
Hansen PC (1992) Analysis of discrete illposed problems by means of the lcurve. SIAM Rev 34:561–580
Hansen PC, O’leary DP (1993) The use of the lcurve in the regularization of discrete illposed problems. J Sci Comput 6:1487–1503
Honsho C, Ura T, Tamaki K (2012) The inversion of deepsea magnetic anomalies using akaike’s bayesian information criterion. J G R 117(B1). https://doi.org/10.1029/2011JB008611
Last BJ, Kubik K (1983) Compact gravity inversion. Geophysics 48:713–721
Li Y, Oldenburg DW (1996) 3d inversion of magnetic data. Geophysics 61:394–408
Li Y, Oldenburg DW (2000) Joint inversion of surface and threecomponent borehole magnetic data. Geophysics 65:540–552
Liu G, Song C, Lu Q, Liu Y, Feng X, Gao Y (2015) Impedance inversion based on l1 norm regularization. J Appl Geophys 120:7–13
Loris I, Nolet G, Daubechies I, Dahlen FA (2007) Tomographic inversion using l1norm regularization of wavelet coefficients. Geophys J Int 170:359–370
Maruyama S, Seno T (1986) Orogeny and relative plate motions, example of the Japanese islands. Tectonophysics 127:305–329
Minami T, Utsugi M, Utada H, Kagiyama T, Inoue H (2018) Temporal variation in the resistivity structure of the first nakadake crater, aso volcano, Japan, during the magmatic eruptions from november 2014 to May 2015, as inferred by the active electromagnetic monitoring system. Earth Planets Space 70(1):138. https://doi.org/10.1186/s4062301809092
Morijiri R, Nakagawa M (2005) Smallscale melange fabric between serpentinite block and matrix magnetic evidence from the mitsuishi ultramafic rock body, Hokkaido, Japan. Tectonophysics 398:33–44
Nakatsuka T, Okuma S (2005) Aeromagnetic anomalies database of Japan. Dig Geosci Map P6. Geol Surv Japan
Ogawa Y, Uchida T (1996) A twodimensional magnetotelluric inversion assuming gaussian static shift. Geophys J Int 126(1):69–76. https://doi.org/10.1111/j.1365246X.1996.tb05267.x
Okazaki K, Mogi T, Utsugi M, Ito Y, Kunishima H, Yamazaki T, Takahashi Y, Hashimoto T, Ymamaya Y, Ito H, Kaieda H, Tsukuda K, Yuuki Y, Jomori A (2011) Airborne electromagnetic and magnetic surveys for long tunnel construction design. Phys Chem Earth 36:1237–1246
Pilkington M (1997) 3d magnetic imaging using conjugate gradients. Geophysics 62:1132–1142
Pilkington M (2009) 3d magnetic dataspace inversion with sparseness constraints. Geophysics 74:7–15
Portniaguine O, Zhdanov MS (1999) Focusing geophysical inversion images. Geophysics 64:874–887
Portniaguine O, Zhdanov MS (2002) 3d magnetic inversion with data compression and image focusing. Geophysics 67:1532–1541
Rezaie M, Moradzadeh A, Nejati KA (2016) 3d gravity dataspace inversion with sparseness and bound constraints. J Min Environ 14:1–9. https://doi.org/10.22044/jme.2015.558
Sacchi MD, Ulrych TJ (1995) High resolution velocity gathers and offset space reconstruction. Geophysics 60:1169–1177
Tibshirani RJ (1996) Regression shrinkage and selection via the lasso. J R Statist Soc B 58:267–288
Tibshirani RJ (2013) The lasso problem and uniqueness. Electron J Stat 7:1456–1490. https://doi.org/10.1214/13EJS815
Tikhonov AN, Arsenin VY (1977) Solution of Illposed problems. Wiley, New York
Tseng P (2001) Convergence of a block coordinate descent method for nondifferentiable minimization. J Opt Theory Appl 109:475–494
Tsunakawa H, Shibuya H, Takahashi F, Shimizu H, Matsushima M, Matsuoka A, Nakazawa S, Otake H, Iijima Y (2010) Lunar magnetic field observation and initial global mapping of lunar magnetic anomalies by maplmag onboard selene (kaguya). Sp Sci Rev 154(1):219–251. https://doi.org/10.1007/s1121401096520
Ueda I, Abe S, Goto K, Ebina Y, Ishikura N, Tanoue S (2012) Geomagnetic charts for the epoch 2010.0. J Geospat Info Auth Jpn 123:9–19 in japanese
Uieda L, Barbosa VCF (2012) Robust 3d gravity gradient inversion by planting anomalous densities. Geophysics 77:55–66
Wahba G (1990) Spline models for observational data. SIAM, Philadelphia, PA
Wang Y (2011) Seismic impedance inversion using l1norm regularization and gradient descent methods. J Inv Illposed problems 18:823–838
Wang J, Meng X, Li F (2015) A computationally efficient scheme for the inversion of largescale potential field data, application to synthetic and real data. Comput Geosci 85:102–111
Xiang Y, Yu P, Zhang L, Feng S, Utada H (2017) Regularized magnetotelluric inversion based on a minimumsupport gradient stabilizing functional. Earth Planets Space 69(1):158. https://doi.org/10.1186/s406230170743y
Zeyen H, Pous J (1991) A new 3d inversion algorithm for magnetic total field anomalies. Geophys J Int 104(3):583–591. https://doi.org/10.1111/j.1365246X.1991.tb05703.x
Zhang L, Koyama T, Utada H, Yu P, Wang J (2015) A regularized threedimensional magnetotelluric inversion with a minimum gradient support constraint. Geophys J Int 189:296–316
Zhdanov MS, Fang S, Hursan G (2000) Electromagnetic inversion using quasilinear approximation. Geophysics 65:1501–1513
Zhdanov MS, Dmitriev VI, Fang S, Hursan G (2000) Quasianalytical approximations and series in electromagnetic modeling. Geophysics 65:1746–1757
Zhdanov MS, Ellis R, Mukherjee S (2004) Threedimensional regularized focusing inversion of gravity gradient tensor component data. Geophysics 69:925–937
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc B 67:301–320
Acknowlegements
The author thanks Paul Wessel and the University of Hawaii for providing Generic Mapping Tools software (https://www.soest.hawaii.edu/gmt). Maps and graphs in this paper were drawn using this great tool. The author also would like to thank the editor and two reviewers for their valuable comments and suggestions to improve the quality of the paper.
Funding
This work is supported by JSPS (Japan Society for the Promotion of Science) KAKENHI GrantinAid for Scientific Research (C), Grant Numbers JP26350475, and JP19K04967.
Author information
Affiliations
Contributions
This manuscript has one author. The author performed the whole part of this study. The author read and approved the final manuscript.
Corresponding author
Correspondence to Mitsuru Utsugi.
Ethics declarations
Competing interests
The author declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Received
Accepted
Published
DOI
Keywords
 L1–L2 norm regularization
 3D magnetic inversion
 Lasso
 Elastic net