A numerical inversion test was conducted to assess how efficiently the proposed method can reproduce the original distribution of slip from noise-overlapped synthetic displacement data. For comparison, we present results obtained using three types of evaluation functions. First, we used an evaluation function that only included the L2 smoothness regularization term (Eq. 2) called *smoothness*. Second, we used an evaluation function that only included the L1 sparsity regularization term (Eq. 3), called *sparsity*. The third function is the proposed evaluation function that includes smoothness, discontinuity, and sparsity constraints (Eq. 4), called *SDS constraints*.

### Smoothness constraints

The relationship between displacement on a free surface and slip on a plate interface is expressed as:

$$ {d}_k={\displaystyle \sum_{l=1}^N}{G}_{kl}{s}_l+{\varepsilon}_k $$

(1)

where *d*
_{k} is the observed displacements at the *k*-th station on the Earth’s surface, *s*
_{l} is dip-slip of the *l*-th subfault on the plate boundary, *N* is the number of subfaults, *G*
_{kl} is the Green’s function representing displacement at station *k* due to unit slip on subfault *l*, and ε_{k} is the error (including observation noise) at the *k*-th station. We divided the plate interface into small rectangular subfaults, each of which was approximated by three triangles to calculate the angular dislocation (Comninou and Dundurs 1975). Green’s functions are represented on a subfault, which is the combined effect of three angular dislocations within a subfault in an elastic, homogeneous, and isotropic half space.

If only a smoothness constraint is used, as in most previous studies involving geodetic inversion, the evaluation function can be expressed as:

$$ E\left(\boldsymbol{s};\alpha, \beta \right)=\frac{\beta }{2}{\displaystyle \sum_{k=1}^K}{\left({d}_k-{\displaystyle \sum_{l=1}^N}{G}_{kl}{s}_l\right)}^2+\frac{\alpha }{2}{\displaystyle \sum_{i\sim j}}{\left({s}_i-{s}_j\right)}^2, $$

(2)

where *β* is a precision hyperparameter, *α* is a smoothness hyperparameter, *K* is the number of observed GNSS displacements, and Σ_{
i~j
} is the summation of all pairs of neighboring cells. The first term on the right-hand side of Eq. 2 represents the reproducibility between model parameters and observations, and the second term represents the L2 regularization term, which indicates the smoothness of the model parameters (*s*). Based on Kuwatani et al. (2014a), we used first-derivative regularization for the smoothness constraint. The hyperparameters, *α* and *β*, can be selected through the maximization of marginal likelihood technique proposed by Kuwatani et al. (2014a). For the numerical inversion test, we calculated the true values of these hyperparameters from the true slip *s*
_{
t
}. We refer to the true values of hyperparameters *α* and *β* as \( {\alpha}_t\left({\alpha}_t={s}_t^T\mathrm{C}{s}_t/N\right) \) and *β*
_{
t
}(*β*
_{
t
} = 1/*σ*
^{2}), respectively. The matrix *C* is an *N* × *N* symmetric matrix that sums the differences of all pairs of nearest-neighbor subfaults (see Additional file 1). See Eq. (16) in Kuwatani et al. (2014a) for details. The square of the standard deviation (*σ*
^{2}) used to represent the observational noise levels added to the observational data helps to determine *β*
_{
t
}. The values *α*
_{
t
} and *β*
_{
t
} were compared to α and β to evaluate the validity of our method for determining hyperparameters.

### Sparsity constraints

We incorporated two additional constraints as prior information into the analysis in order to accurately restore the heterogeneity of the slip distribution. One is the sparsity constraint that is derived from prior information that the slip area is considered to be smaller than the zero-slip area. In general, the subfault area is set to be wider than the target slip area, which increases the number of model parameters in the inversion analysis and enhances under-determination of the inversion problem.

If only a sparsity constraint is used (e.g., Evans and Meade 2012; Honma et al. 2014), the evaluation function can be written as:

$$ E\left(\boldsymbol{s};\lambda \right)={\displaystyle \sum_{i=1}^N}{\left({d}_k-{\displaystyle \sum_{l=1}^N}{G}_{kl}{s}_l\right)}^2+\lambda {\displaystyle \sum_{l=1}^N}\left|{s}_l\right|, $$

(3)

where λ is a sparsity hyperparameter that controls the effect of the sparsity constraint. The second term on the right-hand side of Eq. 3 represents the sparsity constraint. A large value of λ decreases non-zero components, while λ = 0 introduces no sparsity. L1 regularization produces a compact representation of slip, and may be considered an alternative end-member to smoothed-L2 regularized solutions (Evans and Meade 2012).

To select the appropriate value of sparsity hyperparameter λ, a leave-one-out cross-validation technique can be used, and we adopted the *λ* value that minimized the mean squared residual (MSR) for the evaluation function with a sparsity constraint (Eq. 3). It is noted that we cannot rule out the possibility of inaccurate hyperparameter selection by the cross-validation technique, because GNSS observational data may not be sufficiently independent, causing overfitting.

### Combination of smoothness, discontinuity, and sparsity constraints

As described in Introduction, if we assume only the smoothness constraint, then the estimated slip distribution must be smooth regardless of its true distribution. The evaluation function should be designed to avoid the smoothness constraint at boundaries between non-zero-slip and zero-slip areas. On the other hand, the sparsity constraint cannot reproduce a smooth distribution.

We thus propose a new evaluation function by incorporating three constraints: smoothness, discontinuity, and sparsity constraints as prior information for inversion (Fig. 1). The new evaluation function can be expressed as:

$$ E\left(\boldsymbol{s};\alpha \hbox{'},\beta \hbox{'},\nu \right)=\frac{\beta \hbox{'}}{2}{\displaystyle \sum_{k=1}^K}{\left({d}_k-{\displaystyle \sum_{l=1}^N}{G}_{kl}{s}_l\right)}^2+\frac{\alpha \hbox{'}}{2}{\displaystyle \sum_{i\sim j}\left(1-\delta \hbox{'}\left({s}_i,{s}_j\right)\right){\left({s}_i-{s}_j\right)}^2+\nu {\displaystyle \sum_{l=1}^N}\left|{s}_l\right|} $$

(4)

where *β*’ is a precision hyperparameter, *α*’ is a smoothness hyperparameter for the non-zero-slip area, *v* is a sparsity hyperparameter, and *δ*’(*s*
_{
i
}
*, s*
_{
j
}) behaves like a delta function, and is defined as follows:

$$ \delta \hbox{'}\left({s}_i,{s}_j\right)=\left\{\begin{array}{ll}1,\hfill & \mathrm{if}\;{s}_i=0\;\mathrm{or}\;{s}_j=0\hfill \\ {}0,\hfill & \mathrm{if}\;{s}_i\ne 0\;\mathrm{and}\;{s}_j\ne 0\hfill \end{array}\right.. $$

(5)

The proposed evaluation function in Eq. 4 consists of three terms: the first term on the right-hand side is the reproducibility between model parameters and observations, and the second term constrains the smoothness and discontinuity of model parameters (*s*). For the neighboring pair, if there is zero slip in both cells, or if either of the neighboring cells is zero, then the function *δ*’(*s*
_{
i
}
*, s*
_{
j
}) becomes equal to one; this results in the removal of the smoothness constraint term. The slip of both the neighboring cells must be non-zero for the smoothness constraint to be effective. Thus, the coefficient (1-*δ*’(*s*
_{
i
}
*, s*
_{
j
})) serves as the cutoff tool for the smoothness constraint at the boundary between non-zero and zero-slip areas. The third term on the right-hand side of Eq. 4 represents the sparsity constraint.

The model parameter (*s*), which minimizes the evaluation function in Eq. 4, is expected to be the best solution in terms of satisfying the reproducibility of the observation, the continuity of the slip with a discontinuous boundary, and the sparsity of a slip area. We take possible candidate sets of the model parameters in order to minimize the evaluation function in Eq. 4, i.e., the probability distribution function of the posterior probability *p*(*s*|*d*; *α* ', *β* ', *ν*) in terms of the Bayesian estimation. Following Kuwatani et al. (2014b), we used the Metropolis algorithm (Metropolis et al. 1953), which is a type of Markov chain Monte Carlo (MCMC) method. As a solution for the evaluation function in Eq. 4, we present a posterior mean (PM) solution defined by the mean value of a number of candidate sets, which are sampled using the posterior distribution. For all calculations, we set initial values of model parameters (*s*) equal to zero. We also checked whether the estimated values were independent of the initial values of the model parameters.

The hyperparameters, *α*’, *β*’, and *v* in the evaluation function in Eq. 4 control the behavior of the estimated model parameter (*s*). The hyperparameters must be appropriately determined prior to minimization of the evaluation function in order to estimate accurate model parameters. Because of nonlinearity within the evaluation function (Eq. 4), the maximization of marginal likelihood technique for all three hyperparameters is computationally high-cost and non-stable. Therefore, the hyperparameters were appropriately selected as described below.

The hyperparameter selections were conducted in three steps. First, we selected *λ* by cross-validation, then we determined *α*’ and β’. Finally, we calculated *v* using the value of β’ and *λ*. To determine the smoothness and precision hyperparameters, *α*’ and *β*’, we only used non-zero model parameters (*s*) estimated by Eq. 3, so that the effect of the pre-smoothness coefficient (1-*δ*’(*s*
_{
i
}
*, s*
_{
j
})) can be ignored. We selected the hyperparameters assuming that the two parameters, *α*’ (smoothness for non-zero subfaults) and *λ* (number of zero subfaults), are almost uncorrelated. The assumption is considered to be satisfied in the present study, because *α*’ is related to only non-zero-slip subfaults and *λ* is related to zero-slip subfaults. The sparsity hyperparameter *v* was calculated from the sparsity hyperparameter λ of Eq. 3, by multiplying the coefficient of the reproducibility \( \frac{\beta \hbox{'}}{2} \) of the proposed evaluation function (Eq. 4); this is because (ν = (β '/2) ⋅ *λ*). In addition to *α*
_{
t
} and *β*
_{
t
}, the true values of hyperparameters α’ and β’ are referred to as *α* ' _{
t
} and *β* ' _{
t
}, respectively. In order to examine the validity of the selection of three hyperparameters (α’, β’, and *v*), we check if α’ and β’ are comparable to *α* ' _{
t
} and *β* ' _{
t
}, respectively.