We first introduce conventional GNSS-A array positioning methods using an NLLS technique assuming a horizontally stratified sound speed structure and a horizontally sloping sound speed structure in “Introduction of the conventional NLLS-based GNSS-A positioning methods” section, and we then present the MCMC-based array positioning method in “MCMC-based GNSS-A positioning method” section. The NLLS-based array positioning methods are introduced to validate the proposed method.

### Introduction of the conventional NLLS-based GNSS-A positioning methods

Here, we introduce NLLS-based array positioning methods before formulating an MCMC-based GNSS-A positioning method. We first introduce an NLLS-based array positioning method assuming a horizontally stratified sound speed structure, and then introduce the assumption of a horizontally sloping sound speed structure.

Following Honsho and Kido (2017) and Tomita et al. (2019), the observation equation for the horizontally stratified sound speed structure for the \(n\)th shot to the \(k\)th seafloor transponder is expressed as follows:

$$\frac{1}{M\left({\xi }_{k,{n}_{k}}\right)}{T}_{k,{n}_{k}}^{\mathrm{obs}}=\frac{1}{M\left({\xi }_{k,{n}_{k}}\right)}{T}_{k,{n}_{k}}^{\mathrm{cal}}\left({\mathbf{p}}_{k}+\delta \mathbf{p};\mathbf{u}\left({t}_{{n}_{k}}\right);\boldsymbol{ }{v}_{0}\right)+{C}_{0}\left({t}_{{n}_{k}}\right)$$

(16)

with

$$M\left({\xi }_{k,{n}_{k}}\right)=\frac{1}{\mathrm{cos}{\xi }_{k,{n}_{k}}}.$$

(17)

\({T}_{k,{n}_{k}}^{\mathrm{obs}}\) is the observed round-trip travel time and \({T}_{k,{n}_{k}}^{\mathrm{cal}}\) is the calculated round-trip travel time obtained in “Introduction of exact travel time calculation” and “Approximate travel time calculation” sections. The initial position of the \(k\)th seafloor transponder is denoted by \({\mathbf{p}}_{k}\), \(\delta \mathbf{p}\) denotes the array displacement, \({\mathbf{u}}_{{n}_{k}}\) denotes the sea surface transducer position, and \({v}_{0}\) denotes the initial sound speed profile in the vertical direction. \(M\) indicates a normalizing factor which depends on the shot angles \({\xi }_{k,{n}_{k}}\) and corresponds to a mapping function in GNSS positioning. The nadir total delay (NTD), which is indicated by \({C}_{0}\left({t}_{{n}_{k}}\right)\), expresses the temporal fluctuation of the average sound speed at the time of the \(n\)th shot. Similar to Honsho and Kido (2017), we express the temporal fluctuation of the NTD by the superposition of cubic B-spline functions, \(\Phi\). Thus, using \(J\) number of cubic B-spline functions, \({C}_{0}\left(t\right)\) can be expressed as

$${C}_{0}\left(t\right)=\sum_{j=1}^{J}{c}_{j}{\Phi }_{j}\left(t\right)$$

(18)

where \({c}_{j}\) is a coefficient of the \(j\)th component of the cubic B-spline function. Defining \({\tilde{T}}_{k,n}^{\mathrm{cal}}\) using Eqs. 16–18 as

$$\frac{1}{M\left({\xi }_{k,{n}_{k}}\right)}{\tilde{T}}_{k,{n}_{k}}^{\mathrm{cal}}\left(\delta \mathbf{p},\boldsymbol{ }\mathbf{c}\right)=\frac{1}{M\left({\xi }_{k,{n}_{k}}\right)}{T}_{k,{n}_{k}}^{\mathrm{cal}}\left(\delta \mathbf{p}\right)+\sum_{j=1}^{J}{c}_{j}{\Phi }_{j}\left({t}_{{n}_{k}}\right),$$

(19)

the cost function can be written as

$$\sum_{k=1}^{K}\sum_{{n}_{k}=1}^{{N}_{k}}\left[\frac{1}{M\left({\xi }_{k,n}\right)}\left\{{T}_{k,n}^{\mathrm{obs}}-{\tilde{T}}_{k,n}^{\mathrm{cal}}\left(\delta \mathbf{p},\boldsymbol{ }\mathbf{c}\right)\right\}\right]\to {\mathrm{minimize}},$$

(20)

where \({N}_{k}\) and \(K\) indicate the total number of shots for the \(k\)th transponder and total number of seafloor transponders, respectively. Because the calculation of \({T}_{k,{n}_{k}}^{\mathrm{cal}}\) is nonlinear against \(\delta \mathbf{p}\), this estimation of the unknown parameters is performed using the Gauss–Newton method, which is one of the most popular NLLS methods, as in Honsho and Kido (2017). To obtain reasonable solutions, \(J\) must have been optimized in advance. Because the total number of cubic B-spline functions expresses the roughness of temporal fluctuations of the average sound speed, the roughness potentially has a trade-off relationship with data misfits. Thus, we simply optimized \(J\) using BIC. We assumed that the residuals in Eq. 20 follow the Gaussian distribution and then calculate the likelihood to obtain the BIC.

Following Honsho et al. (2019), the observation equation in the horizontally sloping sound speed structure for the \(n\)th shot to the \(k\)th seafloor transponder is expressed as

$$\frac{1}{M\left({\xi }_{k,{n}_{k}}\right)}{T}_{k,{n}_{k}}^{\mathrm{obs}}\,=\,\frac{1}{M\left({\xi }_{k,{n}_{k}}\right)}{T}_{k,{n}_{k}}^{\mathrm{cal}}\left({\mathbf{p}}_{k}+\delta \mathbf{p};\mathbf{u}\left({t}_{{n}_{k}}\right);\boldsymbol{ }{v}_{0}\right)+{C}_{0}\left({t}_{{n}_{k}}\right)+{\mathbf{g}}_{\mathrm{s}}({t}_{{n}_{k}}){\mathbf{u}}^{\mathrm{hor}}\left({t}_{{n}_{k}}\right)+{\mathbf{g}}_{\mathrm{d}}({t}_{{n}_{k}}){\mathbf{h}}_{k,{n}_{k}}$$

(21)

with

$${\mathbf{h}}_{k,{n}_{k}}=\left(\mathrm{tan}{\xi }_{k,{n}_{k}}\mathrm{sin}{\phi }_{k,{n}_{k}},\mathrm{tan}{\xi }_{k,{n}_{k}}\mathrm{cos}{\phi }_{k,{n}_{k}}\right)$$

(22)

where \({\mathbf{g}}_{\mathrm{s}}\) and \({\mathbf{g}}_{\mathrm{d}}\) indicate shallow and deep gradients, respectively. These terms contain the EW and NS components of the gradients, respectively. \({\mathbf{u}}^{\mathrm{hor}}\) indicates the horizontal component of \(\mathbf{u}\), and \(\phi\) is the azimuth of the acoustic path. Here, we assumed that the gradients were constant during the survey period (i.e., \({\mathbf{g}}_{\mathrm{s}}\left(t\right)={\tilde{\mathbf{g}}}_{\mathrm{s}}\) and \({\mathbf{g}}_{\mathrm{d}}\left(t\right)={\tilde{\mathbf{g}}}_{\mathrm{d}}\)). Moreover, as long as a single sea surface platform is used, a trade-off relationship between the contributions of shallow gradients and the temporal fluctuation of NTD appears (Honsho et al. 2019). Therefore, we simultaneously expressed the temporal fluctuation of NTD and the contribution of the shallow gradients by the superposition of the cubic B-spline functions as follows:

$$C\left(t\right)=\sum_{j=1}^{J}{c}_{j}{\Phi }_{j}\left(t\right)\approx {C}_{0}\left(t\right)+{\tilde{\mathbf{g}}}_{\mathrm{s}}{\mathbf{u}}^{\mathrm{hor}}\left(t\right).$$

(23)

Defining \({\tilde{T}}_{k,n}^{\mathrm{cal}}\) using Eqs. 21–23 as

$$\frac{1}{M\left({\xi }_{k,{n}_{k}}\right)}{\tilde{T}}_{k,{n}_{k}}^{\mathrm{cal}}\left(\delta \mathbf{p},\boldsymbol{ }\mathbf{c},{\tilde{\mathbf{g}}}_{\mathrm{d}}\right)=\frac{1}{M\left({\xi }_{k,{n}_{k}}\right)}{T}_{k,{n}_{k}}^{\mathrm{cal}}\left(\delta \mathbf{p}\right)+\sum_{j=1}^{J}{c}_{j}{\Phi }_{j}\left({t}_{{n}_{k}}\right)+{\tilde{\mathbf{g}}}_{\mathrm{d}}{\mathbf{h}}_{k,{n}_{k}},$$

(24)

the cost function can be written as

$$\sum_{k=1}^{K}\sum_{{n}_{k}=1}^{{N}_{k}}\left[\frac{1}{M\left({\xi }_{k,n}\right)}\left\{{T}_{k,n}^{\mathrm{obs}}-{\tilde{T}}_{k,n}^{\mathrm{cal}}\left(\delta \mathbf{p},\boldsymbol{ }\mathbf{c},{\tilde{\mathbf{g}}}_{\mathrm{d}}\right)\right\}\right]\to \mathrm{minimize}.$$

(25)

Using the Gauss–Newton method, we could estimate the array position, temporal fluctuation of NTD, and deep gradients. The number of cubic B-spline functions \(J\) was set to be the same value, which is determined by the BIC of the inversion results, assuming a horizontally stratified sound speed structure.

### MCMC-based GNSS-A positioning method

Using the fast travel time calculation technique described in “Approximate travel time calculation” section, we construct an MCMC-based GNSS-A positioning method. Following Honsho et al. (2019), we considered a case in which the deep gradient could not be directly estimated due to the data set quality (i.e., moving survey data were insufficient to directly solve the deep gradient, as described in “Introduction” section). In this case, we assumed that a shallow gradient \({\widetilde{\mathbf{g}}}_{\mathrm{s}}\) was present up to the gradient depth \(D\). According to Honsho et al. (2019), this model can provide the following relationship between the shallow and deep gradients:

$${{\tilde{\mathbf{g}}}_{\mathrm{d}}=\frac{D}{2}\tilde{\mathbf{g}}}_{\mathrm{s}}.$$

(26)

Considering Eqs. 21 and 26, we could obtain the following observation equation:

$$\frac{1}{M\left({\xi }_{k,{n}_{k}}\right)}{T}_{k,{n}_{k}}^{\mathrm{obs}}\,=\,\frac{1}{M\left({\xi }_{k,{n}_{k}}\right)}{T}_{k,{n}_{k}}^{\mathrm{cal}}\left({\mathbf{p}}_{k}+\delta \mathbf{p};\mathbf{u}\left({t}_{{n}_{k}}\right);\boldsymbol{ }{v}_{0}\right)+{C}_{0}\left(t\right)+{\tilde{\mathbf{g}}}_{\mathrm{s}}{\mathbf{u}}^{\mathrm{hor}}\left({t}_{{n}_{k}}\right)+{\frac{D}{2}{\tilde{\mathbf{g}}}_{\mathrm{s}}}{\mathbf{h}}_{k,{n}_{k}}.$$

(27)

As indicated above, a trade-off relationship between the contributions of the shallow gradient \({\tilde{\mathbf{g}}}_{\mathrm{s}}{\mathbf{u}}^{\mathrm{hor}}\) and the temporal fluctuation of NTD \({C}_{0}\left(t\right)\) appeared; thus, it was difficult to directly estimate \({\tilde{\mathbf{g}}}_{\mathrm{s}}\) and \(\mathbf{c}\) when \({C}_{0}\left(t\right)\) was modeled by cubic B-spline functions (Eq. 18). To cope with this issue, Honsho et al. (2019) modeled \({C}_{0}\left(t\right)\) as a combination of polynomial functions and cubic B-spline functions, which express the long-term and short-term fluctuations of NTD, respectively. They then introduced a hyper-parameter for suppressing the short-term fluctuation of NTD and optimized it using the ABIC framework (e.g., Yabuki and Matsu’ura 1992). This approach successfully extracted the contributions of a shallow gradient. However, in this method, it is necessary to introduce a large number of cubic B-spline functions; then, the variation of their coefficients is optimized by ABIC. In our method, number of the cubic B-spline functions is optimally reduced by BIC in advance as noted in “Introduction of the conventional NLLS-based GNSS-A positioning methods” section. This optimization is quite useful for fast computation through the MCMC method. Hence, we cannot directly utilize the approach of Honsho et al. (2019) to determine the shallow gradient. Here, we contrived a new approach similar to Honsho et al. (2019). To avoid introducing a hyper-parameter that constrains the short-term fluctuations of NTD in their approach, we propose the following two observation equations:

$$\frac{1}{M\left({\xi }_{k,{n}_{k}}\right)}{T}_{k,{n}_{k}}^{\mathrm{obs}}=\frac{1}{M\left({\xi }_{k,{n}_{k}}\right)}{T}_{k,{n}_{k}}^{\mathrm{cal}}\left(\delta \mathbf{p}\right)+\sum_{j=1}^{J}{c}_{j}{\Phi }_{j}\left({t}_{{n}_{k}}\right)+{\frac{D}{2}{\tilde{\mathbf{g}}}_{\mathrm{s}}}{\mathbf{h}}_{k,{n}_{k}},$$

(28)

$$\frac{1}{M\left({\xi }_{k,{n}_{k}}\right)}{T}_{k,{n}_{k}}^{\mathrm{obs}}\,=\,\frac{1}{M\left({\xi }_{k,{n}_{k}}\right)}{T}_{k,{n}_{k}}^{\mathrm{cal}}\left(\delta \mathbf{p}\right)+\sum_{m=0}^{4}{\gamma }_{m}{t}^{m}+{\tilde{\mathbf{g}}}_{\mathrm{s}}{\mathbf{u}}^{\mathrm{hor}}+{\frac{D}{2}{\tilde{\mathbf{g}}}_{\mathrm{s}}}{\mathbf{h}}_{k,{n}_{k}}.$$

(29)

Equation 28 models the temporal fluctuation of NTD \({C}_{0}\left(t\right)\) and the contribution of the shallow gradient \({\tilde{\mathbf{g}}}_{\mathrm{s}}{\mathbf{u}}^{\mathrm{hor}}\) as cubic B-spline functions following Eq. 23, whereas Eq. 29 models the temporal fluctuation of NTD \({C}_{0}\left(t\right)\) as the long-term fluctuation of NTD by a quartic polynomial function. Therefore, Eq. 28 aims to solve the array displacement similar to Eqs. 24 and 25, and it treats \(\delta \mathbf{p}\) and \(\mathbf{c}\) as unknown parameters. On the other hand, Eq. 29 aims to extract the contribution of the shallow gradient, and it treats \({\widetilde{\mathbf{g}}}_{\mathrm{s}}\) and \({\varvec{\upgamma}}\) as the unknown parameters.

In the Bayesian approach, information on unknowns is expressed using a probability density function (PDF). According to Bayes’ theorem (Bayes 1763), when data vector \(\mathbf{d}\) is given, a posterior PDF for the unknown parameter vector \(\mathbf{x}\) can be formulated as

$$p\left(\mathbf{x}|\mathbf{d}\right)=\frac{p\left(\mathbf{d}|\mathbf{x}\right)p(\mathbf{x})}{p\left(\mathbf{d}\right)}\propto p\left(\mathbf{d}|\mathbf{x}\right)p(\mathbf{x})$$

(30)

where \(p\left(\mathbf{d}|\mathbf{x}\right)\) indicates the likelihood for a given \(\mathbf{x}\), \(p(\mathbf{x})\) is a priori PDF of \(\mathbf{x}\), and \(p(\mathbf{d})\) is a priori PDF of \(\mathbf{d}\). Since \(p(\mathbf{d})\) indicates evidence and is independent of \(\mathbf{x}\), the posterior PDF is propotional to \(p\left(\mathbf{d}|\mathbf{x}\right)p(\mathbf{x})\). In this study, we employed the Metropolis–Hastings algorithm (Metropolis 1953; Hastings 1970). This algorithm samples the posterior PDF from iterative calculations and is one of the most general algorithms for MCMC (Fukuda and Johnson 2008; Kubo et al. 2016). At each iteration step, we produced a candidate unknown parameter vector \(\mathbf{x}\boldsymbol{^{\prime}}\) by adding perturbations to the unknown parameter vector of the previous step \(\mathbf{x}\). In this study, we generated a perturbation for each unknown parameter from a uniform distribution with an individually constant step width. We then calculated the acceptance probability \(\alpha\) as follows:

$$\alpha \left({\mathbf{x}}^{\prime}|\mathbf{x}\right)=\mathrm{min}\left[1, \frac{p\left({\mathbf{x}}^{\prime}|\mathbf{d}\right)q\left(\mathbf{x}|\mathbf{x}\mathbf{^{\prime}}\right)}{p\left(\mathbf{x}|\mathbf{d}\right)q\left({\mathbf{x}}^{\prime}|\mathbf{x}\right)}\right]\propto \mathrm{min}\left[1, \frac{p\left({\mathbf{x}}^{\prime}|\mathbf{d}\right)}{p\left(\mathbf{x}|\mathbf{d}\right)}\right],$$

(31)

where \(q\) denotes a proposal PDF. Because the proposal PDFs \(q\left(\mathbf{x}|\mathbf{x}\mathbf{^{\prime}}\right)\) and \(q\left({\mathbf{x}}^{\prime}|\mathbf{x}\right)\) are symmetric in this case, we could calculate the acceptance probability using the ratio of posterior PDFs. In our approach, we generated a random value \(u\) from a uniform distribution with a range of \(\left[0, 1\right]\) and then accept the candidate when \(\alpha \left({\mathbf{x}}^{\prime}|\mathbf{x}\right)>u\).

In this study, we assumed a uniform distribution with a range of \(\left[-\infty ,\infty \right]\) for the priori PDF of each unknown parameter except the gradient depth \(D\). For the gradient depth, we assumed a uniform distribution with a range from 0 (sea-surface) to an observational site water depth. Thus, when the candidate gradient depth falls within this range, \(p\left(\mathbf{x}\right)=1\); otherwise, \(p\left(\mathbf{x}\right)=0\).

We calculated the likelihood \(p\left(\mathbf{d}|\mathbf{x}\right)\) by assuming that the data misfits followed a Gaussian distribution. Here, the data, residual, and unknown vectors are given as \({\mathbf{d}}_{l}\), \({\mathbf{r}}_{l}\), and \({\mathbf{x}}_{l}\), respectively. These vectors for Eq. 28 are provided as \(l=1\), whereas those for Eq. 29 are provided as \(l=2\). The residual vectors were obtained by subtracting the right-hand side from the left-hand side of Eqs. 28 and 29. The likelihood PDF for each equation is written as

$$p\left({\mathbf{d}}_{l}|{\mathbf{x}}_{l}\right)=p\left({\mathbf{r}}_{l}|{\mathbf{x}}_{l}\right)=\frac{1}{\sqrt{{\left(2\pi \right)}^{N}\left|{\mathbf{C}}_{l}\right|}}\mathrm{exp}\left(-\frac{1}{2}{\mathbf{r}}_{l}^{\mathrm{T}}{\mathbf{C}}_{l}^{-1}{\mathbf{r}}_{l}\right)$$

(32)

where \({\mathbf{C}}_{l}\) is a covariance matrix for each case and \(N\) is the total number of data given as \(N=\sum_{k}^{K}{N}_{k}\). We introduced a scaling parameter, \({\Lambda }_{l}\), for each covariance matrix (Tomita et al. 2021) as follows:

$${\mathbf{C}}_{l}=\left(\begin{array}{ccc}{10}^{{\Lambda }_{l}}{\sigma }_{0}^{2}& & 0\\ & \ddots & \\ 0& & {10}^{{\Lambda }_{l}}{\sigma }_{0}^{2}\end{array}\right)$$

(33)

where \({\sigma }_{0}\) is the initial observational error, which is given as \({\sigma }_{0}=1.0\times {10}^{-4} [s]\). The scaling parameter is given as a power of 10 to promote efficient sampling (e.g., Kubo et al. 2016; Tomita et al. 2021). Including these scaling parameters, the unknown vectors are defined as \({\mathbf{x}}_{1}={\left(\delta \mathbf{p}, \mathbf{c},D, {\Lambda }_{1}\right)}^{\mathrm{T}}\) and \({\mathbf{x}}_{2}={\left({\tilde{\mathbf{g}}}_{\mathrm{s}}, {\varvec{\upgamma}},{\Lambda }_{2}\right)}^{\mathrm{T}}.\)

The MCMC procedure used in this study is summarized in Additional file 1: Fig. S1. We first estimated \(\delta \mathbf{p}\) and \(\mathbf{c}\) by the NLLS-based positioning, assuming a horizontally stratified sound speed structure and also determined \(J\) using BIC, which are treated as initial values for the MCMC iteration (\(\delta {\mathbf{p}}^{\mathrm{ini}}\), \({\mathbf{c}}^{\mathrm{ini}}\)). Then, fixing the obtained \(\delta {\mathbf{p}}^{\mathrm{ini}}\), we calculated the travel times for all data and obtained the following observation equation:

$$\frac{1}{M\left({\xi }_{k,{n}_{k}}\right)}{T}_{k,{n}_{k}}^{\mathrm{obs}}-\frac{1}{M\left({\xi }_{k,{n}_{k}}\right)}{T}_{k,{n}_{k}}^{\mathrm{cal}}\left(\delta {\mathbf{p}}^{\mathrm{ini}}\right)=\sum_{m=0}^{4}{\gamma }_{j}{t}^{m}+{\tilde{\mathbf{g}}}_{\mathrm{s}}{\mathbf{u}}^{\mathrm{hor}}.$$

(34)

Because \({\tilde{\mathbf{g}}}_{\mathrm{s}}\) and \({\varvec{\upgamma}}\) are linear parameters, this equation could be easily solved using the linear least-squares method. The obtained solutions were used as the initial values for the MCMC iteration (\({\tilde{\mathbf{g}}}_{\mathrm{s}}^{\mathrm{ini}}\), \({{\varvec{\upgamma}}}^{\mathrm{ini}}\)). We set the scaling parameters' initial values to zero, \({\Lambda }_{1}^{\mathrm{ini}}= {\Lambda }_{2}^{\mathrm{ini}}=0\). Using these initial values for \({\mathbf{x}}_{1}\) and \({\mathbf{x}}_{2}\), we ran an MCMC iteration. If the number of iterations was odd, we provided perturbation in \({\mathbf{x}}_{1}\) and evaluated it based on the acceptance ratio. If the number of iterations was even, we provided perturbation in \({\mathbf{x}}_{2}\) and evaluate it based on the acceptance ratio. We performed \(5\times {10}^{5}\) iterations and collected samples when the number of iterations was even and over the burn-in period (we set it to be \(1\times {10}^{5}\)).