# Development and examination of new algorithms of traveltime detection in GPS/acoustic geodetic data for precise and automated analysis

- Ryosuke Azuma
^{1}Email authorView ORCID ID profile, - Fumiaki Tomita
^{1}, - Takeshi Iinuma
^{2}, - Motoyuki Kido
^{3}and - Ryota Hino
^{1}

**Received: **15 April 2016

**Accepted: **2 August 2016

**Published: **11 August 2016

## Abstract

### Background

A GPS/acoustic (GPS/A) geodetic observation technique allows us to determine far offshore plate motion in order to understand the mechanism of megathrust earthquakes. In this technique, the distance between a sea-surface platform and seafloor transponders is estimated using the two-way traveltimes (TWT) of acoustic signals. TWTs are determined by maximizing the cross-correlation coefficient between the transmitted and returned signals. However, this analysis caused significantly wrong detection of TWT when the correlogram has an enlarged secondary envelope due to the enlarged amplitude of multiple signals depending on the relative spatial geometry between the ship and the transponder. The handled manual rereading of thousands of correlograms to obtain correct TWTs needs enormous time, and human errors may cause. To prevent these difficulties, an automated TWT determination procedure is valid to process numerous GPS/A data efficiently not only without human errors but also with high precision.

### Proposed methods

We developed automated methods for precisely analyzing GPS/A data. Method 1: The maximum peak in the observed correlogram is read, and a synthetic correlogram is then subtracted from the observation. Then, the same operation is applied to the subtracted waveform. This procedure is iterated until the correlation coefficient lowers than a pre-defined threshold. A true traveltime is defined as the fastest traveltime during the iterations. Method 2: The observed correlograms are divided into several groups based on their similarity through cluster analysis, and a master waveform in each group is selected. Then, the traveltime residual between the maximum and true peaks in the master waveform is manually evaluated. The obtained residual is employed as the correction value for each slave waveform. Further, we employed a seismic data projection to visually inspect the reliability of obtained results.

### Results

We confirmed that both new methods accurately correct misreadings in the current method, which amount to 0.4–0.8 ms roughly corresponding to 30–60 cm difference in the slant range.

### Conclusions

Thus, the proposed algorithms significantly improve the estimation of the transponder location. Further analyses are required to determine the arbitrary threshold values and to construct fully automated algorithms.

## Keywords

## Introduction

Offshore of the Miyagi prefecture, NE Japan, ocean-bottom geodetic observatories have been installed since 2002. GPS/acoustic observation is a combined technique using acoustic ranging and kinematic GPS positioning for precise seafloor geodetic measurement. The GPS/A measurement helped discover an anomalously large displacement of the shallowest portion of the interplate fault during the 2011 Tohoku-oki earthquake (Kido et al. 2011; Sato et al. 2011). In order to understand the spatial and temporal development of the postseismic deformation of such large interplate earthquakes, geodetic observation in and around the focal area is extremely important.

In this study, we introduce and verify newly developed methods that can automatically process acoustic signals with high precision and verify their validity.

## Problem of the waveform of the observed signals

In GPS/A measurement, the seafloor transponder records the signal transmitted from the ship into its internal memory and then returns the recorded data to the ship. Our GPS/A technique adopts a 10-kHz carrier wave, encoded by binary phase-shift keying every two cycles with a seventh-order M-sequence, which amounts to 24.5-ms duration in total. TWT is determined from a correlogram estimated by a cross-correlation between the transmitted and returned signals. Correlograms collected at extended sites are often split into two envelopes of direct and later arrivals (Fig. 1e–j). This characteristic of correlograms tends to be visible and identifiable at deep sites (e.g., G06 and G07); however, at shallow sites, the true peak is difficult to read because the later envelope overlaps the first (e.g., at G14). One possible explanation for this depth-dependent feature is that higher frequency signals will be selectively attenuated, especially at the timing of the phase change in the carrier wave, due to inelastic absorption in seawater for longer ranges and hence deeper sites.

*T*= 0. During analysis, we generally used the maximum peak in the former envelope group as the true peak, considering the sidelobe in the synthetic correlogram (Fig. 2c); this assumption may not always be correct; however, the important point is to pick the same peak among all ranging. The case of shot #568 (Fig. 2f) is a good illustration that the true arrival peak was identified correctly. On the other hand, the influence of envelope splitting can be seen as dragging later peaks with amplitudes greater than that of the true peak in shots #405, #486, and #2058 (Fig. 2d, e, g). We found that the maximum peak was distributed around the first or later envelope within a certain period and it strongly depended on the relative spatial geometry between the ship and the transponder (Fig. 2a, b). In the case of transponder G06-1, the correlation coefficient was at a maximum near the first envelope during shots #200–350 and #550–700 (Fig. 2b, e, f), where the ship was walking around the far side of the transponder, i.e., around G06-3 (Fig. 2a). However, it shifted to the later envelope during other time series, where the ship was walking close to the transponder (Fig. 2a), resulting in misreadings (Fig. 2d, e, g). The time lag between the first and secondary envelopes also varies from ~0.8 ms at the far side to ~0.4 ms at the near side. Thus, path difference in the secondary envelope (hereafter called the multipath) appeared to be on the order of 30–60 cm depending on its incident angle, which may be most probable due to reflection off the glass sphere itself rather than the seafloor, as illustrated in Fig. 3. We are not sure the reason why, including depth dependency, the multipath problem is prominent only in the new seafloor transponder. It may be related to the difference in directivity of the acoustic element in the transducer (ca. ±60° (−10 dB) of the new one is wider enough than ca. ±45° of previous limit), or the difference in geometrical position between the transducer and the glass sphere. Regardless, this misreading of ~0.8 ms during the walk-around observation and ~0.4 ms during the point observation corresponds to ~60 and ~30 cm in the range, respectively, and may degrade the reliability of the position calculation of each transponder.

To improve the quality of GPS/A data, the improvement of instruments and/or data processing methods are considered. As mentioned above, the possible source of the multiple reflections, based on the time lag between the direct and multipath signals, is the surface of the pressure-resistant glass sphere storing the acoustic control unit of the transponder (Fig. 3). Such a multipath effect must be identified and improved in actual field experiments. However, even if improving the acoustic unit of the instrument reduced the multipath effect, it requires considerably high monetary and temporal costs to replace or repair them at all sites because each site comprises 3–6 transponders. Thus, we developed a new solution to avoid the misreading of large-amplitude multipath signals, and we provisionally applied it to a waveform analysis.

## Methods

We designed two different algorithms that avoid misreading of the maximum peak caused by the MC method. Both methods detect a true TWT by reprocessing the correlograms obtained by the MC method.

### Peak subtraction (PS) method

*C*

_{min}. The detailed process is as follows.

- 1.
Calculate the autocorrelation function

*f*_{syn}(*t*) (Fig. 3a) of the synthetic signal. - 2.
Calculate the observation correlation function

*f*_{obs}(*t*) by taking a cross-correlation of the synthetic and returned signals. Then, derive the correlation coefficient*C*_{0}of the maximum peak and its traveltime*t*_{0}(denoted by an arrow in Fig. 3b). - 3.
Normalize

*f*_{syn}(*t*) by*C*_{0}, subtract normalized*f*_{syn}(*t*) from*f*_{obs}(*t*) after aligning*f*_{syn}(*t*) and*f*_{obs}(*t*) in*t*_{0}, and then obtain*f*_{1}(*t*). - 4.
Determine the maximum peak

*C*_{1}and its traveltime*t*_{1}(denoted by an arrow in Fig. 3c). - 5.
Normalize

*f*_{syn}(*t*) by*C*_{1}, align peaks of*f*_{syn}(*t*) and*f*_{1}(*t*) in*t*_{1}, and then subtract*f*_{syn}(*t*) from*f*_{1}(*t*) and obtain*f*_{2}(*t*) (Fig. 3d).

Steps 2 to 5 are iterated until *C*
_{
n
} < *C*
_{min} (Fig. 3). The smallest *t*
_{
n
} is recognized as the true traveltime *t*
_{
p
}. *f*
_{obs}(*t*) with *C*
_{0} < *C*
_{min} is excluded from the analysis objects.

### Cluster analysis (CA) method

*k*-means method (Hartingan and Wong 1979), in which the user determines the number of groups in advance.

- 1.
Determine the observation correlation function of each returned signal.

- 2.
Find cross-correlation between the observation correlation functions for all combinations.

- 3.
Perform cluster analysis using the k-means method (Hartingan and Wong 1979) on the database obtained at the preceding step (Fig. 5a). The number of groups that the database should be divided into is determined by trial and error; we employed here 20 groups in this analysis, which is large enough to illustrate most types of waves. It should be noted that the number of groups does not significantly affect the final result if the number is sufficient.

- 4.
Choose the correlogram whose average of whole the cross-correlation coefficient between others in a corresponding group is highest, as the master correlogram.

- 5.
Determine \(\Delta t\) between the true peak in the former envelope group and the maximum peak in the master correlogram (Fig. 5b). \(\Delta t\) equals zero if the correlation coefficient of the direct arrival is the largest.

- 6.
Obtain the true TWT of the slave correlograms by correcting \(\Delta t\) when a cross-correlation coefficient between the master and slave correlograms becomes the largest (Fig. 5c).

*C*

_{min}so that it is obvious that a larger

*C*

_{min}determines a peak with larger amplitude, while a smaller

*C*

_{min}determines a more appropriate one (Fig. 4), whereas the CA method determines a unique \(\Delta t\) for each group (Fig. 5). Several

*C*

_{min}should be examined to find a suitable value, as shown in Fig. 6. Both the PS and CA methods work automatically except for the steps of determining the threshold in the PS method and selecting \(\Delta t\) in the CA method. Details of the results are discussed in the next section.

## Results and verification

Figure 6a–c compares the time lag between the peaks by the MC and PS methods. From Fig. 6a–c, we can see that the TWTs output by the threshold of *C*
_{min} = 0.30 was dispersed more than three periods of the frequency and mainly accepted the peak in the later envelope even during the point observation, while almost all outputs by *C*
_{min} = 0.15 and 0.20 were distributed in the first envelope and dispersed with roughly one or sometime two wavelengths. This indicates that a large threshold (0.30) makes the result unstable due to overlooking peaks around direct arrivals with a smaller coefficient than the threshold so that the smaller threshold definitely picks peaks in the first envelope. When remarking plots of shots during the walk-around observation (Fig. 6b, c), dispersion within one period is still recognized and seems to arise due to the difference in the degree of correlativity of each shot. The distribution of lag time given by the CA method (Fig. 6d) is also dispersed during the walk-around observation and is similar to the case of *C*
_{min} = 0.15 and 0.20 (Fig. 6b, c), but the same group color lines on the individual peak. Thus, the dispersal of the accepted peaks by the PS method shown in Fig. 6b, c probably reflects a slight difference of correlograms between neighbor groups. On the other hand, the lag times by the CA method during the point observation are closely aligned because almost all correlograms are classified in a single group (Fig. 6d), whereas those by the PS method are still dispersed (Fig. 6b, c). Considering the present way for determining \(\Delta t\) by manually reading direct wave in the master correlogram during the CA procedure, it is natural to find a difference in lag time of one wavelength between the new methods. This is also recognized in Fig. 7d, which is the pasteup after aligning correlograms at corrected TWT as *T* = 0, as a gap of one wavelength between the neighbor groups around shots #150–350 and #450–700 (Fig. 7d). This offset by a systematic error results from no quantitative determination of \(\Delta t\), which is the misreading of the peak of the direct wave in the master correlogram. The improved TWT of 0.8 ms in maximum during the walk-around observation and 0.4 ms on average during the point observations (Figs. 6, 7) are consistent with 60 and 30 cm in slant range, respectively. Therefore, we conclude that both devised procedures perfectly avoid the misidentification of the multipath peak that occurred in the MC procedure; however, gaps of roughly one or sometime two wavelengths with neighbor shots or groups remain.

Incidentally, we find that, assuming the determined TWTs by the PS method is the completely true one, peaks that alternate around *T* = 0 with a time difference of less than ~0.01 ms appear in both types of observations; the walk-around observation at shot #0–800 and the point observation after shot #800 (Fig. 7c). We consider that this gap is possibly caused by instrumental limitation due to the 100 kHz sampling rate. The time of the point sampled near to a peak on the digital data would be off by a maximum 0.005 ms (half of a sample interval) compared to the peak top. As a result, the TWT of the maximum peak in *f*
_{obs}, as well as the TWT of subtracted correlogram *f*
_{
n
}, would be shifted to the time of a nearby sample. In brief, the PS method would cause this gap during iteration. Therefore, we concluded that the cause of the apparent gap in the peaks is an instrumental issue. A higher sampling-rate recording system for returned signals would be required to improve the results of the PS method. On the other hand, such instability in TWT estimation was not recognized in the results of the CA method (Fig. 5d). This difference originates from the difference in the approach toward handling correlograms, i.e., the CA method handles a single correction value within a group, whereas the PS method processes each correlogram. Thus, the difference between two devises increases or decreases with one epoch in sampling rate from one wavelength. In the point view of picking the same peak among the all ranging, the CA method is more solid than the PS method.

In summary, the new methods rarely caused a slight gap in the peak of the observation correlogram (Fig. 7c, d), and then determined TWTs near the identical peak and stably and effectively corrected the misreading results of the MC method. The new methods improved TWT residuals of 0.4–0.8 ms, consistent with 30–60 cm in slant range, compared to those derived by the MC method (Figs. 7, 8, 9, 10). We therefore conclude that the developed methods greatly improved analysis precision. In addition, we suggest that parallel processing with these methods would allow comparison and verification of the reliability of the results.

## Discussion for future work

Finally, we discuss current issues with the new methods. Several problems must be solved before automating the proposed methods. The PS method requires the threshold (*C*
_{min}) against the correlation coefficient to search the direct peak in the correlogram. The user must determine this threshold before analysis. If the threshold is too large, the peak near the direct signal with a lower coefficient than the criterion might be overlooked (Fig. 6a). In contrast, if the threshold is too small, the peak in front of the direct arrival may be detected. To accurately determine a true peak without overlooking peaks with lower coefficients than the threshold, the proposed method should be quantitatively estimated the required optimum threshold. Although we regard that the decrease of correlativity is probably caused by the decrease of signal-to-noise ratio depending on environment, a distance attenuation, the fact that *C*
_{min} of 0.15–0.20 could determine a peak within one or two wavelength with less difference from the CA output (Fig. 6b–d), they can determine TWT of direct arrivals with the precision of one period of correlogram notwithstanding the slant range changes up to ~1.5 times against the nearest (note that the maximum incident angle exceeds 45°). In this point, we therefore conclude the suggesting *C*
_{min} of correlograms are comparably stable in both walk-around and point observations no matter how deep the site is. At least from the data of G06 examined in this study, it is difficult to find (or rather extract) the affection of environment dependence. Further investigation for the waveform condition, which is under a different environment, is necessary to select optimal *C*
_{min} quantitatively. Meanwhile, in the CA method, the \(\Delta t\) might have difficulty in determining the suitable number of groups. Moreover, the time correction \(\Delta t\) also might be unsuitable because it is defined manually. To overcome the limitations of the CA method, the most suitable number of clusters should be considered and \(\Delta t\) must be detected objectively. To fully automate the analyses, it is necessary to compensate for their weak points, i.e., by determining the threshold in the PS and \(\Delta t\) in the CA methods, through a detailed analysis of the waveform and result stability, respectively. One possible way is to detect the \(\Delta t\) of the master wavelet by the PS method.

However, although the threshold problem remains, the procedure for GPS/A data can become almost completely automated. Fully automation will be useful for processing enormous numbers of wave data. Recently, the GPS/A technique has changed observation style from offline campaign observation by cruises to online data transfer via sea-surface stations and satellites. Therefore, full automatic processing could lead to real-time monitoring of seafloor displacement that can contribute to an earthquake and tsunami early warning system and provide priceless information on geophysical phenomena of the seafloor.

## Conclusions

Problems in TWT detection exist for received signals collected at GPS/A seafloor sites deployed in 2012. The most significant problem with the conventional MC method has been misreading of multipaths with the largest amplitude peak as the true peak in the observed correlogram. We verified this by creating pasteups of the correlogram and found that the amplitude and peak splitting of correlograms varies depending on not only the water depth of the sites but also the incident angle of the transmitted signal, i.e., the relative spatial geometry between the ship and the transponder. To avoid the harmful influence of the large multipath signal and to improve the precision of the transponder position detection, we designed two methods that reanalyze the observation correlograms obtained by the MC method. The pasteupped correlograms after the reprocessing showed that both new methods accurately identified the peak around the direct arrival. The comparison of TWTs estimated by the new methods suggests that their differences converged within ~0.07 ms, equal to less than the wavelength of the correlogram. Therefore, parallel processing using these methods will help verify the reliability of the identification of direct arrivals. We believe that the new techniques proposed in this study are effective for high precision TWT detection in acoustic data processing. Furthermore, we recommend using pasteup views for visually verifying the validity of analysis results. Further improvement of the remaining subjective parameters of these methods will contribute to the perfect automation of GPS/A data processing, and, in future, the real-time monitoring of seafloor displacement to provide precise information of seafloor phenomena for an earthquake and tsunami early warning system.

## Declarations

### Authors’ contributions

RA suggested the method for the visual inspection of correlograms, assessed the result of each method, and drafted the manuscript. FT suggested and developed the CA method and analyzed the data. TI suggested and developed the PS method. MK arranged the seafloor geodetic network and collected GPS/A data. RH contributed to discussions on the scientific content and suggested revisions to the manuscript. All authors read and approved the final manuscript.

### Acknowledgements

The GPS/A surveys and benchmarks were financially supported by MEXT, Japan. This work was also partly supported by the Council for Science, Technology, and Innovation, the Cross-Ministerial Strategic Innovation Promotion Program, and the “Enhancement of social resiliency against natural disasters” program (fundamental agency: JST). We used the waveform analysis tool “PASTEUP” (personal communication with Dr. G. Fujie) for viewing and editing correlograms. We would like to thank Editage (www.editage.jp) for English language editing. All figures were prepared by using Generic Mapping Tools (GMT 4.5.3) (Wessel and Smith 1998).

### Competing interests

The authors declare that they have no competing interests.

**Open Access**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

## Authors’ Affiliations

## References

- Hartingan JA, Wong M (1979) Algorithm AS 136: a k-means clustering algorithm. Appl Stat 78:100–108View ArticleGoogle Scholar
- Kido M, Osada Y, Fujimoto H, Hino R, Ito Y (2011) Trench-normal variation in observed seafloor displacements associated the 2011 Tohoku-oki earthquake. Geophys Res Lett 38:L24303. doi:10.1029/2011GL050057
- Kido M, Fujimoto H, Hino R, Ohta Y, Osada Y, Iinuma T, Azuma R, Wada I, Miura S, Suzuki S, Tomita F, Imano M (2015) Achievement of the project for advanced GPS/acoustic survey in the last four years. In: Hashimoto M (ed) International symposium on geodesy for earthquake and natural hazards (GENAH). Int Assoc Geod Symp, vol 145. Springer, Heidelberg. doi:10.1007/1345_2015_127
- Sato M, Ishikawa T, Ujihara N, Yoshida S, Fujita M, Mochizuki M, Asada A (2011) Displacement above the hypocenter of the 2011 Tohoku-Oki earthquake. Science 332(6036):1395. doi:10.1126/science.1207401 View ArticleGoogle Scholar
- Wessel P, Smith WHF (1998) New, improved version of the Generic Mapping Tools released. Eos Trans AGU 79(47):579Google Scholar