### Waveform simulation

We used the QSSPPEGS code developed by Zhang et al. (2020), which was modified from the QSSP code (Wang et al. 2017), to synthesize the pre-P gravity signal waveforms. QSSPPEGS numerically solves the fully coupled elastogravity equations for a spherically symmetric Earth model:

$$\rho \frac{{\partial }^{2}}{\partial {t}^{2}}{\varvec{u}}=\nabla \cdot{\varvec{\sigma}}+\rho \nabla \left(\psi -g{u}_{r}\right)+\rho g\left(\nabla \cdot {\varvec{u}}\right){{\varvec{e}}}_{r}+{\varvec{f}},$$

$${\nabla }^{2}\psi =4\pi G\nabla \cdot \left(\rho {\varvec{u}}\right)$$

(1)

and calculates pre-P ground acceleration \(\ddot{{\varvec{u}}}\) and gravity change \(\delta {\varvec{g}}\boldsymbol{ }\left(=\nabla \psi \right)\). Here, \(\rho =\rho \left(r\right)\) is the unperturbed density; \({\varvec{u}}={\varvec{u}}\left(r, \theta , \phi , t\right)\) is the particle displacement vector; \({\varvec{\sigma}}={\varvec{\sigma}}\left(r, \theta , \phi , t\right)\) is the incremental stress tensor in the Lagrangian description; \(\psi =\psi \left(r, \theta , \phi , t\right)\) is the incremental gravity potential in the Eulerian description; \(g=g\left(r\right)\) is the undisturbed static Earth gravity (downward positive); \({\varvec{f}}={\varvec{f}}\left(r, \theta , \phi , t\right)\) represents the seismic source; \(G\) is the gravitational constant; \(r, \theta , \phi\) are the radial distance, polar angle, and azimuthal angle with the origin at the center of the spherically symmetric Earth, respectively; and \({{\varvec{e}}}_{r}\) is the radial unit vector. The pre-P gravity signal measured by ground-based sensors was synthesized as \({\varvec{s}}=\ddot{{\varvec{u}}}-\delta {\varvec{g}}\). We ignored the tilt terms \(-g\frac{\partial {u}_{z}}{\partial x} \; \mathrm{ and} -g\frac{\partial {u}_{z}}{\partial y}\) for the horizontal components \({s}_{x}\; \mathrm{and} \; {s}_{y},\) and free-air term \({u}_{z}\frac{dg}{dz}\) for the vertical component \({s}_{z}\) due to their insignificant contributions to the signal amplitudes (Juhel et al. 2019; Zhang et al. 2020). Here, the \(x, y,\) and \(z\) axes were considered as eastward, northward, and upward positives, respectively.

As detailed in "Detection of the horizontal components of the pre-P gravity signal" section, a point source was adopted for the waveform simulation of the 2011 Tohoku-Oki earthquake. It was located at the hypocenter (latitude = 38.19°N, longitude = 142.68°E, depth = 21 km) determined by Chu et al. (2011). Event origin time \({t}_{\mathrm{eq}}\), seismic moment \({M}_{0}\), rupture duration \(T\), and source mechanism (strike, dip, rake) were set to 05:46:23 UTC, \(5.31\times {10}^{22}\) Nm, 140 s, and (203°, 10°, 88°), respectively (Global Centroid Moment Tensor: GCMT; Ekström et al. 2012). The moment rate function \(\dot{M}\left(t\right)\) is described as a squared half-period sinusoidal function:

$$\dot{M}\left(t\right)={M}_{0}\frac{2}{T}{\mathrm{sin}}^{2}\left(\pi \frac{t}{T}\right) \quad \left(0\le t\le T\right).$$

(2)

A modified Earth structure model based on AK135 by Wei et al. (2012) was used, and the Green's functions were calculated to a frequency of 0.25 Hz. The synthesized signal waveforms were truncated at the P-wave arrival time \({t}_{\mathrm{P}}\). Here, \({t}_{\mathrm{P}}\) is the smaller of the visually picked value \({t}_{\mathrm{P}}^{\mathrm{obs}}\) and \({t}_{\mathrm{P}}^{\mathrm{theo}}\) (2 s before the theoretical value calculated using the TauP toolkit; Crotwell et al. 1999). In "Waveform inversion" section, the dip angle and \({M}_{0}\) were chosen as the model parameters that could not be well constrained from vertical records (Zhang et al. 2020). The rupture duration \(T\) was assumed to obey the scaling law (Kanamori and Anderson 1975).

### Point source correction

The pre-P gravity signals calculated for a point source tend to be larger than those calculated for a finite fault model with the same total seismic moment (Fig. S6 of Zhang et al. 2020). Additional file 1: Figure S2 shows that this overestimation is particularly prominent in the horizontal components, which affects the inversion results. To evaluate further acceptable signal amplitudes, in "Waveform inversion" section, we employed an empirical constant factor (estimated roughly to be 0.75 as in Additional file 1: Fig. S2), which was multiplied by the synthesized waveforms in the horizontal components.

### Waveform stacking

The amplitude of the pre-P gravity signal is comparable to or smaller than the background noise level (Montagner et al. 2016; Kimura et al. 2019a; Vallée and Juhel 2019), making it challenging to identify the signal in a single trace. Even after applying appropriate band-pass filtering, every trace appears to be noise (Figs. 2, 3 of Kimura et al 2019a). To reduce the noise and enhance the signal, we used the waveform stacking approach (Kimura et al. 2019a; Vallée and Juhel 2019). Because the amplitudes of the pre-P gravity signals are in many cases expected to monotonically increase until the P-wave arrives (Fig. 1 of Zhang et al. 2020), the waveforms are aligned and stacked with \({t}_{\mathrm{P}}\) as:

$$\overline{a}\left(t\right)=\frac{1}{N}\sum_{i=1}^{N}\mathrm{sgn}\left({s}^{i}\left({t}_{\mathrm{P}}^{i}\right)\right){a}^{i}\left({t}_{\mathrm{P}}^{i}+t\right) \left(t\le 0\right),$$

(3)

where \({s}^{i}\left(t\right)\), \({a}^{i}\left(t\right)\), and \({t}_{\mathrm{P}}^{i}\) are the synthetic acceleration, observed acceleration, and P-wave arrival time at the \(i\)th sensor, respectively; \(\overline{a}\left(t\right)\) is the stacked trace; \(N\) is the total number of stacked sensors; and \(\mathrm{sgn}\left(*\right)\) is the sign function for the polarization reversal. The synthetic waveforms can also be stacked by replacing \({a}^{i}\left(t\right)\) with \({s}^{i}\left(t\right)\).

### Inversion method

For the waveform inversion, we used the misfit function defined as

$$R\left(p\right)=\frac{1}{\alpha }\sum_{i=1}^{N}\frac{1}{{{\sigma }^{i}}^{2}}{\int }_{{t}_{1}^{i}}^{{t}_{2}^{i}}{\left|{a}^{i}\left(t\right)-{s}^{i}\left(t;p\right)\right|}^{2}dt,$$

(4)

where \(p\) is the variable parameter to be estimated, \({\sigma }^{i}\) is the standard deviation of \({a}^{i}\left(t\right)\) before \({t}_{\mathrm{eq}}\) (here, we defined a time window \(\left[{t}_{\mathrm{eq}}-10 \mathrm{min}, {t}_{\mathrm{eq}}\right]\)), and \({s}^{i}\left(t;p\right)\) is \({s}^{i}\left(t\right)\) calculated using the parameter values of \(p\). Because the amplitudes of pre-P gravity signals increase just before the P-wave arrival, we used the last quarter of the time window between \({t}_{eq}\) and \({t}_{p}^{i}\), i.e., \({t}_{2}^{i}={t}_{\mathrm{P}}^{i}\) and \({t}_{1}^{i}={t}_{\mathrm{P}}^{i}-\frac{1}{4}\left({t}_{\mathrm{P}}^{i}-{t}_{\mathrm{eq}}\right)\). We defined the normalization constant as \(\alpha =\sum_{i=1}^{N}{\int }_{{t}_{1}^{i}}^{{t}_{2}^{i}}dt\) such that \(R\left(p\right)\) reached unity if the noise level was time-invariant and the signal was completely fitted with the parameter values of \(p\).