Skip to main content

Statistical study on the regional characteristics of seismic activity in and around Japan: frequency-magnitude distribution and tidal correlation


We propose a statistical analysis method to identify common features of seismic activity that are indistinguishable from most other seismicity, and to find anomalous activity that differs from these common features. Using the hypocenter catalog of earthquakes which occurs in and around Japan during the past 20 years, we apply this method for the parameters of the frequency-magnitude distribution and a parameter that expresses the correlation of seismicity with tides as indices, with a focus on objectively understanding the regional characteristics of seismicity. As a result, we extracted a "typical" probability density distribution of each index value common to the most analysis regions and "anomalous" regions with index-value distributions that differ significantly from the typical distributions. In terms of the frequency-magnitude distribution, most estimated values of indices in the anomalous activity areas can be explained as variations corresponding to the effects of fluids, interplate coupling, and stress fields that control faulting styles that have been pointed out in previous studies. By extracting typical index values for the frequency-magnitude distribution, common features of the frequency-magnitude distribution that depend on the earthquake occurrence interval were identified. Although seismicity showed no clear correlation with tides, the index value for tidal correlation changes to reflect the proportion of earthquakes occurring in a series of periods shorter than the tidal period; it is therefore useful as an index to capture the characteristics of such earthquake occurrence intervals. We also show that the typical probability density distribution of these index values can be represented by existing models or their extensions. By using the proposed models as a reference, it is possible to quantify the degree of anomaly using the same concept as that of the method presented here; hence, such a method should be applicable to monitor seismic activity.

Graphical Abstract


An earthquake occurs when shear stress applied to a fault exceeds the fault strength. Thus, daily seismic activity is considered to reflect the stress field and state of pre-existing faults (e.g., Scholz 2019), that is difficult to observe directly. For this reason, the detection of changes in the characteristics of seismic activity in observational data can be an objective of seismic monitoring, even if the physical interpretation of such data remains difficult.

On the other hand, earthquakes have some universal characteristics: (1) the frequency-magnitude distribution of seismic activity is expressed by the Gutenberg-Richter law (hereafter, the GR law; Gutenberg and Richter 1944); (2) aftershock activity is expressed by a power-law relationship represented by the Omori-Utsu law (Utsu et al. 1995); and (3) except for aftershocks, the occurrence of earthquakes is often considered to be random in time (i.e., a Poisson process; e.g., Wyss and Toya 2000). One way to detect changes in seismicity is to use parameters that represent these general characteristics as indices and monitor their temporal and/or spatial variation.

However, in practice, such changes in index values do not always correspond directly to changes in the characteristics of seismic activity because seismic activity analyses are essentially point estimations made using a finite number of samples, and the estimated index value contains a mixture of variabilities, depending on the population distribution and sample size. In particular, when looking at changes in the characteristics of seismic activity, a finite number of seismic events are spatially and temporally partitioned for analysis. This results in a trade-off between resolution and sample size, and it increases variance and makes it difficult to estimate true changes of the index values from differences between individual values. In contrast, when the spatiotemporal resolution is coarser and a wider range of events are analyzed together, it is possible to reduce the estimation error under the assumption that the index value is constant over the range. However, even in this case, it is necessary to evaluate whether or not the assumption of a constant index value is appropriate within that range, and it remains difficult to estimate true changes of the index value.

Rather than looking at the variability of individual estimation results, an effective means of extracting significant changes from the results of point estimation with a finite number of samples is to use statistical analysis to evaluate the significance of differences in population characteristics. A representative example is to test the null hypothesis that a target sample was obtained from the same population as some reference sample. In this study, we attempt to extract characteristics common to many seismic activities as well as those that differ from them using such an approach. Rather than arbitrarily choosing the reference samples, we consider seismic activity with typical characteristics that cannot be distinguished from those of most other samples as the reference. We believe that this idea is justified by the fact that many seismic activities have similar characteristics.

Another way of extracting variations in the characteristics of seismic activity is to use a sophisticated model, such as the ETAS model (Ogata 1988). The use of such models requires assumptions about the spatiotemporal variations of model parameters to estimate which variations are most likely to explain the observed data (e.g., Ogata and Zhuang 2006; Kumazawa et al. 2017). In the present study, however, we tried to make as few assumptions as possible and directly examine changes in the index values estimated from the observed data. Such a basic attempt will improve our knowledge and models of seismic activity.

In this paper, we focus on parameters related to the frequency-magnitude distribution of seismicity, which are often discussed in relation to the stress field, as well as parameters representing the correlation of seismic activity with the tides. We report the results of our analysis of the characteristics of seismicity in and around Japan from the viewpoint of regional characteristics, including the characteristics of typical seismicity and those that differ from typical activity. To capture changes in the characteristics of seismicity from various perspectives, the same dataset was used to analyze each index, and, as a result, their joint probability distributions were obtained.

It is possible to quantify the degree of anomaly based on the joint probability distribution of multiple indices, even in cases where the indices are correlated. The idea of quantifying the degree of anomaly from multiple observables is similar to the concept of Aki (1981), who tried to increase probability gain by synthesizing the probability that changes in various observables as precursors to large earthquakes. However, we do not focus on such special cases of precursor phenomena; rather, our goal was to grasp the possible values of each index for typical seismicity and quantify the degree of deviation from those values.

Data and methods for statistical test on index values

Hypocenter data and analysis window

We used the JMA unified earthquake catalog as the hypocenter data, except for data that had been flagged as low-frequency earthquakes. To statistically analyze seismic activity, it is desirable to use as many hypocenters of the same quality as possible. For this purpose, different lower threshold magnitudes (\({M}_{\mathrm{th}}\)) were set for the land area (area A, Fig. 1a) and the entire area of Japan and its surroundings (area B, Fig. 1b) because the detectability of earthquakes differs between land and sea areas. We used \({M}_{\mathrm{th}}=1.95\) for area A and \({M}_{\mathrm{th}}=3.45\) for area B for hypocenter data between January 2000 and August 2020 based on the deployment history of the station network, the occurrence of large earthquakes that can affect the detectability of earthquakes, and the distribution of completeness magnitude estimated in previous studies (e.g., Nanjo et al. 2010). Here, \({M}_{\mathrm{th}}\) was set taking into account the discretization half-width of magnitude (0.05). The depth range was uniformly set to be shallower than 30 km for area A and shallower than 100 km for area B, the same values used in CSEP-Japan (Nanjo et al. 2011).

Fig. 1
figure 1

The epicentral distribution and centers of the rectangular areas used in the analyses. Red dots show epicenters and blue crosses show the center of each grid for \(l=0.2^\circ\) (\(0.1^\circ\) intervals in accordance with CSEP-Japan). For \(l=0.4^\circ\), the center of the rectangular area (\(0.2^\circ\) intervals) was the point where both the latitude and longitude were halved. a Land areas (area A); b the entire area in and around Japan (area B)

One important question is how large should the spatiotemporal range be to characterize seismic activity? Here, we determine the scale of the spatial extent \(l\) (degrees) and uniformly set a rectangular region \(l\times l\); we then estimate index values one by one from \(N\) consecutive hypocenters whose magnitude \(M\) is larger than \({M}_{\mathrm{th}}\) in the region. As described later, by using a constant \(N\), the variation of the index value that depends on \(N\) can be kept constant. If the results of analyses with several sets of \(l\) and \(N\) are not significantly different, the spatiotemporal range can be considered not to affect the characteristics represented by the indices. If \(l\) is too small, there will be too few earthquakes to analyze; if \(l\) is too large, regional changes in activity will not be visible. Here, we set \(l=0.2^\circ \mathrm{or }\;0.4^\circ\), taking into consideration the occurrence rate of earthquakes above \({M}_{\mathrm{th}}\) and the distance affected by occasional, relatively large earthquakes (about M6–7). We set \(N=50\) or 100 so that the variation in the index value does not become too large and the number of index values does not become too small.

\(l\) and \(N\) are used as the spatial window and the event number window, respectively, and the index values are estimated by shifting these windows by half. This corresponds to spatiotemporal smoothing. The spatial area was set so that the center of the rectangular area was the latitude and longitude of the node used by CSEP-Japan. The event number window was set from the latest data back to the past. Figure 1 shows the spatial distribution of the epicenters and the coordinates of the centers of the rectangular windows with \(l=0.2^\circ\) (i.e., 0.1° intervals). For reference, an example seismicity analysis for a single spatial region is also shown in Additional file 1: Fig. S1. The same analyses have been performed for all spatial window shown in Fig. 1. A schematic of spatiotemporal analysis windows and the overlap of data between them, which needs to be treated with care in the statistical test described in 2.4, are shown in Additional file 1: Fig. S2.

Parameters estimated to index the characteristics of seismicity

In this section, we describe the three index values (\(b\), \(\eta\), and \(D\)) that we estimated as parameters representing the characteristics of seismicity. For reference, mathematical symbols used in this paper related to magnitude and number of events are summarized in Table 1.


We estimated \(b\), one of the parameters of the GR law, as an index of the characteristics of the frequency-magnitude distribution (FMD). In the GR law, the number of earthquakes of a certain magnitude \(M\) among the hypocenters extracted in a given spatiotemporal range is expressed as

$$\begin{array}{*{20}c} {{\text{log}}_{{10}} n\left( M \right) = a - bM,} \\ \end{array}$$

where \(a\) is a parameter that depends on the total number of extracted hypocenters and \(b\) represents the slope of the log frequency against M. If Eq. (1) holds for a certain number of earthquakes larger than \({M}_{\mathrm{th}},\) by using the maximum likelihood method (Aki 1965), we can estimate \(b\) from the observed earthquake magnitude \({M}_{i}\left(\ge {M}_{\mathrm{th}}\right)\) (\(1\le i\le N\)) as

$$\begin{array}{c}b=\frac{{N}\log_{10}e}{\sum_{i=1}^{N}\, \left(M_i-M_{\mathrm{th}}\right)} .\end{array}$$

The index value estimated using Eq. (2) corresponds to the inverse of the arithmetic mean of the magnitudes of the hypocenters analyzed. As described below, the results of the analysis show that the GR law is not strictly true in many cases, but as a parameter that reflects the characteristics of the earthquakes (i.e., the average magnitude), \(b\) is still a meaningful index. In estimating \(b\) using Eq. (2), it should be noted that \({M}_{\mathrm{th}}\) must be set so that earthquakes of a larger magnitude than \({M}_{\mathrm{th}}\) are not missed. Here, we first estimate b and the magnitude of completeness \({M}_{\mathrm{c}}\) for each of the \(N\) hypocenters, and we performed the statistical analysis only when \({M}_{\mathrm{c}}<{M}_{\mathrm{th}}\). The method used to estimate \({M}_{\mathrm{c}}\) is described in “Estimation of \({{\varvec{M}}}_{\mathbf{c}}\)” section.

If \(b\) does not vary during the analysis period (i.e., event number window including \(N\) earthquakes), the standard error on \(b\) from the above maximum likelihood estimation is \({\sigma }_{b}=b/\sqrt{N}\) (Aki 1965). Thus, if the FMD follows the GR law with a true \(b\)-value, \({b}_{\mathrm{true}}\), the theoretical probability density distribution (PDD) of the estimated \(b\)-value is centered on \({b}_{\mathrm{true}}\) with a variance that depends on \(N\) (Additional file 1: Fig. S3a). Therefore, when \(b\)-values are estimated for each of \(N\) events in a large number of hypocenters, the deviation of the estimated \(b\)-value distribution from the theoretical distribution corresponds to the variation of the true \(b\)-value, or the deviation of the FMD from the GR law. The spatiotemporal variation of \(b\) has been shown in previous studies. In an analysis of seismic activity in central California, for example, the frequency distribution of observed \(b\)-values was well explained by assuming the slow temporal changes in \(b\) (Shi and Bolt 1982). Regional and depth-dependent differences in analyzed \(b\)-values, which corresponded well with the differential stresses inferred from tectonic conditions, have also been observed (e.g., Scholz 2015). The analysis presented in this paper assumes that there are some spatiotemporal variations in \(b\)-values and separates them into the typical variation that is common to most analysis areas and other variations.


As another index of FMD characteristics, we adopted \(\eta\), which represents the deviation from the GR law (Utsu 1978). The index value is estimated as

$$\begin{array}{c}\eta =\frac{N\sum_{i=1}^{N}{\left(M_{i}-M_{\mathrm{th}\,}\right)}^{2}}{{\left\{\sum_{i=1}^{N}\,\left(M_{i}-M_{\mathrm{th}\,}\right)\right\}}^{2}} .\end{array}$$

That is, it is estimated as the ratio of the arithmetic mean of the squares of the magnitudes greater than \({M}_{\mathrm{th}}\) to the square of the arithmetic mean of earthquake magnitudes. This value corresponds to the shape of the FMD. Theoretically, if \(N\) is large enough and the GR law holds (i.e., if the FMD is a straight line), then \(\eta =2.\) For downwardly convex and upwardly convex FMDs, \(\eta >2\) and \(\eta <2\), respectively.

Assuming that the GR law holds, the PDD of \(\eta\) for a finite \(N\) is dependent only on \(N\) and not on \(b\). As \(N\) becomes smaller, the peak of the distribution shifts to smaller values, and the variance increases (Additional file 1: Fig. S3b). For this reason, \(\eta\) is a useful index to express FMD characteristics almost independently of \(b\). When the analysis is performed for a large number of hypocenters with the same finite value of \(N\), the deviation of the PDD of \(\eta\) from the theoretical distribution expected from the GR law corresponds to the deviation of the FMD from the GR law.

A previous study that analyzed individual M7–9-class earthquakes off the Pacific coast of East Japan showed that \(\eta\) was small before the mainshock and tended to increase after it (Hirose and Maeda 2017). However, little is known about the typical value of this index, which is necessary for detecting such fluctuations. The analysis presented herein shows the typical PDD of \(\eta\) during typical seismic activity and how often a significantly different distribution from the typical one appears when the index value is analyzed uniformly in and around Japan.


D represents the correlation between seismicity and Earth's tides, and is used as an index of the sensitivity of the response of seismicity to small external stress disturbances. This index is based on Schuster's test (Schuster 1897), which is used in the analysis of tidal correlation (e.g., Tsuruoka et al. 1995; Tanaka et al. 2002), and is expressed as

$$\begin{array}{c}D=\left\{{\left({\sum }_{i=1}^{N}\mathrm{cos}{\theta }_{i}\right)}^{2}+{\left({\sum }_{i=1}^{N}\mathrm{sin}{\theta }_{i}\right)}^{2}\right\}^{1/2},\end{array}$$

where \({\theta }_{i}\) is the phase angle of the tidal variation at the depth and onset of the i-th out of \(N\) earthquakes to be analyzed. The phase angle is obtained from the tidal response of the volumetric strain estimated using the method of Hirose et al. (2019a). We focus on fault-independent volumetric strain in order to include small earthquakes for which the focal mechanism has not yet been solved. In the eastern part of the Izu Peninsula, a good correspondence between volumetric strain changes associated with magmatic intrusions and seismic swarm activity has been reported (e.g., Kumazawa et al. 2016).

When the occurrence of an earthquake is uncorrelated with the tides (i.e., represented as a Poisson process), the probability density function of \(D\), \(f\left(D\right)\), is approximated by the following Rayleigh distribution (Schuster 1897):

$$\begin{array}{c}f\left(D\right)=\frac{2D}{N}exp\left(-\frac{{D}^{2}}{N}\right) .\end{array}$$

If the seismic activity is correlated with the tide, the PDD of the \(D\)-value estimated with a constant \(N\) will shift to a larger value than that shown in the above Rayleigh distribution. In Schuster's test, the probability p that the estimated \(D\)-value is obtained by chance from Eq. (5) is used to determine whether there is a significant tidal correlation. As with other indices, we first check the typical distribution of \(D\) and then show how often a distribution significantly differs from the typical one.

Of course, some seismic activities cannot be regarded as a Poisson process, such as aftershocks of an earthquake that has already occurred. In particular, if the aftershocks occur frequently during a period shorter than the tidal cycle (the most dominant period is about 12 h), the earthquakes are apparently concentrated during some tidal phases, and the \(D\)-value described by Eq. (4) takes a large value regardless of the tidal correlation. Although not all aftershocks have a significant effect on \(D\), there is no doubt that they have at least some effect on it. For this reason, hypocenter catalogs have often been declustered in previous studies (e.g., Tanaka et al. 2002). However, because it is not the aftershocks themselves that affect the analysis, but the activities that occur together within a short period of time relative to the tidal cycle, declustering for the purpose of separating the aftershock activities is not always appropriate in the analysis of \(D\)-values. Instead of declustering, to perform the analysis on the same dataset used for the other index values to the greatest extent, as well as to handle the hypocenter catalogs with as little processing as possible, we simply calculated the minimum value of the period required for the occurrence of \(N/4\) earthquakes out of \(N\) to be analyzed (\(\mathrm{min}{T}_{N/4}\)) and excluded all datasets with \(\mathrm{min}{T}_{N/4}\) < 21,600 s (6 h). We did so because, if the activity during the analysis period can be regarded as a stationary Poisson process, the bias in the \(D\)-value distribution caused by frequent earthquakes can be seen when \(\mathrm{min}{T}_{N/4}<\mathrm{21,600} \;\mathrm{s}\) (Additional file 1: Fig. S4), and the effect can be largely removed by excluding this case. Figure 2 shows the relationship between \(D\) and \(\mathrm{min}{T}_{N/4}\) estimated in each analysis window: in both the area A (Fig. 2a) and area B (Fig. 2b), \(D\) clearly depends on \(\mathrm{min}{T}_{N/4}\) and increases remarkably for activities that occur frequently within a short period of time approximately less than 21,600 s (6 h) while the distribution of \(D\) is constant for \(\mathrm{min}{T}_{N/4}\ge \mathrm{ 21,600 \;s}\). In actual seismic activity that includes aftershocks that follow the Omori-Utsu law, there may be aftershock effects that cannot be excluded by this process. However, instead of making an effort to completely exclude or account for these effects in our analysis, we have included them in the statistical analysis and discuss the effects of aftershocks in the results.

Fig. 2
figure 2

Relationship between the \(D\)-value normalized by \(N\) and \(\mathrm{min}{T}_{N/4}\). a Estimation results for area A; b estimation results for area B. The vertical dashed line indicates 21,600 s (6 h)

Estimation of \({{\varvec{M}}}_{\mathbf{c}}\)

To perform an analysis using data with the same quality, it is necessary to exclude data obtained while the events are less detectable because of the effects of large earthquakes; thus, it is essential to estimate the magnitude of completeness, \({M}_{\mathrm{c}}\). As described in detail by Woessner and Wiemer (2005), various methods to estimate \({M}_{\mathrm{c}}\) have been proposed. Here, we used the maximum curvature (MAXC) of the FMD as \({M}_{\mathrm{c}}\) because it is a relatively simple method that does not assume the GR law to be strictly valid and the \(\eta\)-value that represents the deviation from the GR law is also used as an index. Instead, the MAXC method only assumes that more smaller earthquakes occur than larger ones (Woessner and Wiemer 2005).

Although this method is simple and easy to understand, there is a practical problem with using a finite number of hypocenters in the spatiotemporal range for the analysis of each index value: how to set the lower limit \({M}_{\mathrm{z}}\) of the magnitude range used to estimate MAXC. If \({M}_{z}\) is too small, it tends to underestimate \({M}_{\mathrm{c}}\) when the detectability of earthquakes changes in the middle of the analysis period, as shown in Additional file 1: Fig. S5. This is a common problem for methods, such as the Entire Magnitude Range (EMR) (Woessner and Wiemer 2005), which uses even small earthquakes. On the other hand, if \({M}_{\mathrm{z}}\) is set equal to \({M}_{\mathrm{th}}\), the number of earthquakes slightly larger than \({M}_{\mathrm{th}}\) may coincidentally become the largest group, even if there is no decrease in detectability because of a finite \(N\), and \({M}_{\mathrm{c}}\) can therefore be overestimated. As shown in the cumulative frequency distribution of \({M}_{\mathrm{c}}\) obtained by the MAXC method under the assumption of the GR law (Fig. 3), the degree of overestimation of \({M}_{\mathrm{c}}\) depends on the values of \(N\) and \(b\) when \({M}_{\mathrm{z}}={M}_{\mathrm{th}}\). For this reason, we set \({M}_{\mathrm{z}}(<{M}_{\mathrm{th}})\) so that we do not miss 90% of the number of \(b\)-values, assuming that the GR law is valid in the vicinity of \({M}_{\mathrm{th}}\) with a constant \(b\)-value equal to the lower limit of the range of possible \(b\)-values, which can be obtained from the \(b\)-value analysis. Specifically, based on our estimation of \(b\) described below, the lower limit of \(b\) is estimated to be 0.6 for area A and 0.4 for area B. Thus, from Fig. 3, \({M}_{\mathrm{z}}={M}_{\mathrm{th}}-0.5\) and \({M}_{\mathrm{z}}={M}_{\mathrm{th}}-0.3\) were used for the analyses with \(N=50\) and \(N=100\), respectively, in area A. Similarly, \({M}_{\mathrm{z}}={M}_{\mathrm{th}}-0.8\) and \({M}_{\mathrm{z}}={M}_{\mathrm{th}}-0.5\) were used for the analyses with \(N=50\) and \(N=100\), respectively, in area B. Using appropriate values of \({M}_{\mathrm{z}}\), which are chosen in accordance with the analysis data, can reduce the tendency to underestimate \({M}_{\mathrm{c}}\) when the detectability of earthquakes changes in the middle of the analysis period. It also avoids the overestimation of \({M}_{\mathrm{c}}\) caused by the use of a finite number of hypocenters, which is unique to the MAXC method. In addition, the earthquakes above \({M}_{\mathrm{z}}\) extracted within the spatiotemporal range of the analysis were resampled 1,000 times using the bootstrap method to estimate MAXC, and the arithmetic mean of MAXC estimated for all resamples was used as \({M}_{\mathrm{c}}\) (Woessner and Wiemer 2005; Mignan and Woessner 2012).

Fig. 3
figure 3

Cumulative frequency distribution of \({M}_{\mathrm{c}}\) expected under the GR law when the lower limit of \({M}_{\mathrm{z}}\) is fixed. For each value of b and N, M-series data (the number of data points is N) are generated 10,000 times using random numbers; the MAXC method with bootstrap resampling (1,000 times) is used to estimate of \({M}_{\mathrm{c}}\)

In the present analysis, \(b\) and \(\eta\) can be directly compared with the \({M}_{\mathrm{c}}\) estimated in the same analysis window. When \({M}_{\mathrm{c}}\ge {M}_{\mathrm{th}}\) as a result of reduced detectability, the number of earthquakes on the low-magnitude side decreases unnaturally, and both \(b\) and \(\eta\) are expected to decrease depending on \({M}_{\mathrm{c}}\). Figure 4 plots \(b\) and \(\eta\) against the estimated \({M}_{\mathrm{c}}\) in each analysis window for \(l=0.4^\circ , N=50\), i.e., the case with the largest number of estimated index values. The median \(b\)-value slightly decreases with \({M}_{\mathrm{c}}\) when \({M}_{\mathrm{c}}<{M}_{\mathrm{z}}+0.2\) (near the left sides of Fig. 4a, b), which is not a decrease in detectability. Rather, this reflects the feature of the MAXC method in which \({M}_{\mathrm{c}}\) tends to decrease as \(b\) increases (Fig. 3). The median \(b\)-value distribution is almost constant near \({M}_{\mathrm{c}}\approx {M}_{\mathrm{th}}\); therefore, there is no observed effect of detectability reduction. In the case of \({M}_{\mathrm{c}}>{M}_{\mathrm{th}}\) (i.e., right side of the vertical dotted line in Fig. 4a, b), the entire distribution of \(b\)-values clearly decreases with increasing \({M}_{\mathrm{c}}\), which is considered to correspond to a decrease in earthquake detectability. As for \(\eta\), the decrease of the entire distribution with increasing \({M}_{\mathrm{c}}\) is more pronounced for \({M}_{\mathrm{c}}>{M}_{\mathrm{th}}\), which is also considered to correspond to a decrease in earthquake detectability (Fig. 4c, d).

Fig. 4
figure 4

Plot of (upper) \(b\) and (lower) \(\eta\) against \({M}_{\mathrm{c}}\) for \(l=0.4^\circ\) and \(N=50\). a, c Area A; b, d area B. The median of each index value entering a bin of width 0.05 along the horizontal axis is also shown; the bars show the 90th and 10th percentiles. The vertical dotted lines correspond to \({M}_{\mathrm{c}}={M}_{\mathrm{th}}\)

For reference, we also used the same plots to compare our results for \({M}_{\mathrm{c}}\) as estimated using the EMR method (Additional file 1: Fig. S6). In this case, there was no overall change in the distribution of either \(b\)- or \(\eta\)-values around \({M}_{\mathrm{c}}\approx {M}_{\mathrm{th}}\), suggesting that \({M}_{\mathrm{c}}\) is often slightly overestimated (by about 0.2 overall) from the viewpoint of appropriately obtaining these index values. Therefore, in this analysis, because it is important to use as many data points as possible, the MAXC method is considered to give better results, so we adopted this method for estimating \({M}_{\mathrm{c}}\). In the case of \(l=0.4^\circ\) and \(N=50\), the number of hypocenter sets (i.e., the number of estimated values of each index) with \({M}_{\mathrm{c}}<{M}_{\mathrm{th}}\) was 11,194 for the MAXC method and 10,197 for the EMR method, indicating that the MAXC method uses about 10% more data. In contrast, the median \(b\)- and \(\eta\)-values estimated from these datasets are the same to three decimal places (0.909 and 1.869, respectively), indicating that the effect of change in earthquake detectability does not differ between these index datasets.

Statistical testing to distinguish anomalous index values

To identify regions with anomalously different index-value PDDs, we compare the PDDs of index values obtained in each \(l\times l\) region with the combined PDD from all the other regions by performing a statistical test to determine whether the PDDs are significantly different. That is, the probability that two distributions are obtained by chance from the same PDD of the index value is estimated as a \(p\)-value, which is used to quantitatively evaluate the degree of anomaly. Since any overlap in the hypocenter data used to estimate the index values renders the testing inappropriate, we extracted index values from the results of the analysis so that the original data would not overlap. The total number of non-overlapping patterns is eight for each \(l\) and \(N\) pair because the latitude, longitude, and number of hypocenters were each shifted by half (Additional file 1: Fig. S2).

We adopted two types of non-parametric methods, the Kolmogorov–Smirnov (KS) test (e.g., Hodges 1958) and the Brunner-Munzel (BM) test (Brunner and Munzel 2000; Neubert and Brunner 2007), to test the significance of the difference in PDDs. The PDDs of the parameters used in the present analysis as indices for the characteristics of seismicity (\(b\), \(\eta\), \(D\)) differ from each other. To the best of our knowledge, no comprehensive study of the PDDs of these values has been conducted. In addition, since the same approach is expected to be applied to other indices with different PDDs, a non-parametric method that is not limited in its scope of application should be effective. Because the PDD of each index and the way in which anomalies appear are unknown, we refer to the minimum \(p\)-value estimated by both methods to evaluate the degree of anomaly, that is \(p=\mathrm{min}\left({p}_{\mathrm{KS}}, {p}_{\mathrm{BM}}\right)\), where \({p}_{\mathrm{KS}}\) and \({p}_{\mathrm{BM}}\) are the \(p\)-values estimated by the KS test and BM test as described in Appendices. These methods are effective when two or more index values are obtained in a single rectangular region for analysis. When only one index value is obtained in a region, we use the \(p\)-value corresponding to the two-tailed test calculated from the rank of the index value among the all index values.


Distributions of index values in and around Japan

The PDDs of the estimated \(b\)-, \(\eta\)-, and \(D\)-values for areas A and B are shown in Fig. 5. Figure 5a–d plots the results for the case \({M}_{\mathrm{c}}<{M}_{\mathrm{th}}\), and Fig. 5e, f plots the results for the case \(\mathrm{min}{T}_{N/4}\ge \mathrm{21,600} \;\mathrm{s}\). All index values used to calculate these PDDs and some related estimates, such as \({M}_{\mathrm{c}}\) and \(\mathrm{min}{T}_{N/4},\) can be provided in Additional file 2, Additional file 3, Additional file 4, Additional file 5, Additional file 6, Additional file 7, Additional file 8, and Additional file 9.

Fig. 5
figure 5

Probability density distributions of each index value. Numbers in square brackets in the legend indicate the number of estimated index values. a, b \(b\); c, d \(\eta\); e, f \(D\). Left, area A; right, area B. The black lines in ad are the distributions expected from the GR law where the true value of \(b\) is constant at 0.9. The black lines in e and f are the distributions expected when no tidal correlation exists

The PDDs of the estimated values are commonly almost independent of the size of \(l\). In other words, the blue dashed and red lines (l = 0.2° and 0.4°, respectively, N = 50) and the cyan dashed and orange lines (l = 0.2° and 0.4°, respectively, N = 100) in each panel in Fig. 5 are almost identical. This implies that the statistical properties of seismic activity characterized by these indices are almost independent of spatial scale at a spatial scale of around \(0.2-0.4^{^\circ }\).

Looking at the PDDs of the estimated \(b\)-values (Fig. 5a, b), the lower limit of possible \(b\)-values is as low as 0.6 for area A and 0.4 for area B. As mentioned in “Estimation of \({{\varvec{M}}}_{\mathbf{c}}\)” section, the lower bound \({M}_{\mathrm{z}}\) used to estimate \({M}_{\mathrm{c}}\) was set with reference to these values. Figure 5a–d also shows the expected PDDs of \(b\) based on the GR law with a constant \({b}_{\mathrm{true}}\) (\(N\) = 50 and 100) for reference. The PDD of the estimated b-values in area A is close to that expected for the GR law with a constant b-value, indicating that the true range of variability of this index value is not large (Fig. 5a). This also corresponds with the result showing that the variance of \(b\) is smaller for \(N=100\) than for \(N=50\), which suggests that most of the observed variability of \(b\) is due to the finite value of \(N\). In contrast, the PDD of \(b\) in area B is wider than expected from the GR law with a constant \({b}_{\mathrm{true}}\), suggesting that the true variation of \(b\) is as wide as it appears in Fig. 5b, independent of \(N\).

The estimated PDDs of \(\eta\) are lower than expected from the GR law, indicating that there is an upward convex trend in the FMD in both areas A and B (Fig. 5c, d). The effect of \(N\) on the PDD of \(\eta\) is clear in area A but less obvious in area B, suggesting that the true range of variation of \(\eta\) is larger in area B, as was also the case with the \(b\)-value.

The results of the \(D\)-value estimation (Fig. 5e, f) are shown together with the theoretical distributions for the case where tides and seismic activity are uncorrelated (black lines). The obtained PDDs are similar to the theoretical distribution, including their dependence on \(N\) but many of the values are slightly larger than the theoretical distribution. Comparing the results for area A with those for area B, there is a greater number of larger values in the results for area A. The reason for these slightly larger \(D\)-values may be due to both the effect of tidal correlations and the effect of frequent earthquakes that are not excluded by \(\mathrm{min}{T}_{N/4}<\mathrm{21,600}\; \mathrm{s}\). We discuss this point later in conjunction with other results.

The above estimates of \(b\)-, \(\eta\)-, and \(D\)-values were all made for the same dataset and were obtained as joint probability distributions. The relationship between the b- and η-values will also be discussed in “Anomalous η-values” section.

Statistical test to distinguish anomalous index values

The PDDs of index values obtained in each target rectangular region, and which exceeded \({M}_{\mathrm{c}}\) for \(b\) and \(\eta\) and \(\mathrm{min}{T}_{N/4}\) for \(D\), were tested against the index-value PDDs combined from all other regions. Figure 6 shows the cumulative density distribution of \(p\)-values obtained by the method described in “Statistical testing to distinguish anomalous index values” section. The results for \(b\) (Fig. 6a, b) show that the frequency of low \(p\)-values is clearly higher than expected under the null hypothesis (the one-to-one diagonal dashed line). This is presumably because their PDDs differ due to the spatiotemporal variation of the true \(b\)-value. The results for \(\eta\) (Fig. 6c, d) are similar, but the difference is weaker. In the results for \(D\) (Fig. 6e, f), the cumulative density distribution of \(p\)-values is almost the same as that expected under the null hypothesis (e.g., a value expected at a rate of \(p=1\%\) has an occurrence rate of at most \(p=2\%\)), and it is difficult to find many spatial regions where the PDD of \(D\) is significantly different from the others.

Fig. 6
figure 6

Cumulative distributions of the \(p\)-values obtained from the statistical tests. The overall results, including the results of single index values, are shown in black, and the individual results of the KS and BM tests are shown in purple and orange, respectively. a, b \(b\); c, d \(\eta\); e, f \(D\). Left, area A; right, area B

The differences in the results between the BM and KS tests are negligible, with the only difference being that \({p}_{\mathrm{BM}}\) produces slightly more low values in the results for \(b\) in area A (i.e., the orange lines plot above the purple lines in Fig. 6a). This difference can be attributed to the difference in the shape of the PDD and the way its values change. Although there is little difference between the results of the two methods, by using the minimum \(p\)-values obtained by both methods, it is possible to extract regions where the PDDs of index values may differ from those in other regions with relatively high sensitivity. In addition to the two methods used herein, any more sensitive methods testing for significant differences among PDDs can be used in the same framework.

Figure 7 shows the spatial distribution of \(p\)-values compiled using the signed frequency \({f}_{\mathrm{lp}}=\frac{{\sum }_{i}^{{n}_{\mathrm{all}}}{s}_{i}}{{n}_{\mathrm{all}}}\), where \({n}_{\mathrm{all}}\) is the number of results with the same region center coordinates, i.e., up to eight, depending on whether the center coordinates for \(l=0.2^\circ\) and \(l=0.4^\circ\) coincide (Additional file 1: Fig. S2) and whether there are enough hypocenters in the region for each \(l\) and \(N\). The variable \({s}_{i}\) takes one of the following values depending on the \(p\)-value and average value of each index value: \(-1\) if \(p<0.05\) and the mean of the index value is lower than the mean of the other regions, 1 if \(p<0.05\) and the mean of the index value is higher than or equal to the mean of the other regions, and 0 otherwise. The white points (\({f}_{\mathrm{lp}}\) ≈ 0) in Fig. 7 are the regions with index values whose PDDs are indistinguishable from those of the others.

Fig. 7
figure 7

Spatial distribution of \({f}_{\mathrm{lp}}\) for each index value. Red and blue symbols indicate the value of \({f}_{\mathrm{lp}}\) (see text). a, b \(b\); c, d \(\eta\); e, f \(D\). Left, area A; right, area B. The epicenters of the large earthquakes (M ≥ 6.1 in area A, M ≥ 7.0 in area B) are also plotted in a and b. Green contours in b, d, and f indicate the surface-projected coseismic slip of 10 m of the Tohoku-oki earthquake (Suito et al. 2012). Activity in regions of low \(p\)-values (red or blue) indicated by arrows is discussed in the text

In the \(b\)-value results (Fig. 7a. b), regions with low \(p\)-values are spatially clustered, and many have large absolute values of \({f}_{\mathrm{lp}}\). The \(\eta\)-value results (Fig. 7c, d) show a similar trend, although slightly less clustered. The physical phenomena that correspond to these locations are discussed in the next section. What is important here is that the extracted regions with low \(p\)-values are clustered in the same place, regardless of \(l\) and \(N\), although there are some differences in sensitivity. In other words, the extracted regions are determined by the significance level, which is set as the \(p\)-value threshold (0.05 in this case), almost independent of the arbitrarily chosen values of \(l\) and \(N\). They can thus be said to reflect the essential characteristics of the index-value distributions.

In contrast, the \(D\)-value results (Fig. 7e, f) show that low \(p\)-values are sparsely distributed throughout the analysis area, and few regions have large absolute values of \({f}_{\mathrm{lp}}\). This is consistent with the interpretation that most of the low \(p\)-values for \(D\) appeared by chance, as shown in Fig. 6e, f. The result that low \(p\)-values are detected at a frequency corresponding to the number of trials is similar to that of Tanaka et al. (2004), who showed 13 regions with \(p\)-values (Schuster's test) lower than 10% out of a hundred 1° × 1° regions. This result is also consistent with the results of Wang and Shearer (2015); they employed a declustering method that takes into account the tidal period to examine the \(p\)-values of various spatiotemporal bins and similarly found no clear tidal correlation. Although we did not find significant tidal correlations, this result supports the significance of the low \(p\)-value regions that appeared in the results for \(b\) and \(\eta\) by serving as an example of the absence of significant distributional differences.


Deviation of the FMD from the GR law

The PDDs of \(\eta\) obtained in this study show that the FMD is generally convex upward, and the GR law is not always strictly valid. This result has important implications, both for the interpretation of our results and seismologically. However, the discussion of this point is accompanied by doubts about whether we estimated \({M}_{\mathrm{c}}\) really accurately. After a large earthquake, \({M}_{\mathrm{c}}\) increases due to the decrease in detectability, and then decreases as aftershocks decay. If the change of \({M}_{\mathrm{c}}\) for the initial aftershocks is not accurately estimated, it may lead to underestimation of \(\eta\). Here, we used a method for estimating \({M}_{\mathrm{c}}\) that relies as little as possible on the assumption of the GR law, and we confirmed the validity of the estimation results (Fig. 4). However, considering the importance of the results, we used another method to confirm this tendency.

Here, we estimate \(b\)-positive (\({b}^{+}\)), which is insensitive to transient changes in \({M}_{\mathrm{c}}\) and was recently proposed (van der Elst 2021) to check for deviations from the GR law. \({b}^{+}\) is estimated from the difference in magnitude between each earthquake and that which preceded it, \(m\), that satisfies \(m\ge {m}_{\mathrm{th}}>0\), where \({m}_{\mathrm{th}}\) is an arbitrary constant representing a threshold, as follows:

$$\begin{array}{c}{b}^{+}=\frac{{N}_{\mathrm{th}}\log_{10}e}{\sum _{i=1}^{N_{\,\mathrm{th}}}\,\left({m}_{i}-{m}_{\mathrm{th}}+\delta \right)} ,\end{array}$$

where \({N}_{\mathrm{th}}\) is the number of earthquakes satisfying \(m\ge {m}_{\mathrm{th}}\) and \(\delta\) is the width of the magnitude discretization (here, \(\delta =0.05\)). If the GR law holds, \({b}^{+}\) is equivalent to b (van der Elst 2021). The advantage of \({b}^{+}\) is that it can be estimated almost independently of the change in \({M}_{\mathrm{c}}\) because it is sufficient to detect earthquakes of at least a certain magnitude larger than that observed immediately before, even if detectability is reduced after a large earthquake.

Assuming that any earthquakes larger than the preceding one were not missed and that the GR law holds, the estimated \({b}^{+}\) for different lower limits of earthquake magnitude, \({M}_{\mathrm{min}}\), should be approximately the same when averaged, even though \({N}_{\mathrm{th}}\) decreases and thus the variance of \({b}^{+}\) increases as \({M}_{\mathrm{min}}\) increases. To apply this point to the data used in our analysis, we used up to the earthquakes whose magnitudes are smaller than \({M}_{\mathrm{th}}\) in the catalog in all the spatiotemporal regions analyzed for the four sets of \(l\) and \(N\). Using \({M}_{\mathrm{ref}}={M}_{\mathrm{th}}-1.0\) (0.95 for area A and 2.45 for area B) as references, we estimated the differences between the mean estimated values of \({b}^{+}\) for different \({M}_{\mathrm{min}}\) with \({m}_{\mathrm{th}}=0.2\). As shown in Fig. 8, in the actual analysis, \({b}^{+}\) tends to become larger as \({M}_{\mathrm{min}}\) increases, and this tendency is larger in area B. This result is consistent with the PDDs of estimated \(\eta\)-values, which strongly suggests that the FMD has an upward convex trend.

Fig. 8
figure 8

The difference between the mean of \({b}^{+}\) estimated using an earthquake with \(M\ge {M}_{\mathrm{min}}\) and the mean of \({b}^{+}\) when \({M}_{\mathrm{min}}={M}_{\mathrm{ref}}\). The results expected from the GR law for constant values of \(b\) are also shown

Figure 8 also shows the expected results that were calculated numerically under the assumption of the GR law as a reference. Numerically, we generated \(M\) above \({M}_{\mathrm{ref}}\) until the number of earthquakes with \(M\)\({M}_{\mathrm{th}}\) reached \(N\) and then estimated \({b}^{+}\) for each \({M}_{\mathrm{min}}\). Because earthquakes larger than the preceding one are used to estimate \({b}^{+}\), the number of available data points becomes smaller as \({M}_{\mathrm{min}}\) becomes larger. When \({M}_{\mathrm{min}}={M}_{\mathrm{th}}\), the expected number of events used to estimate \({b}^{+}\) is less than half of \(N\). This reduction in event number used to estimate \({b}^{+}\) is reflected by the downturn in the numerically calculated \({b}^{+}\) values on the right side of Fig. 8. It is noted that the estimated \({b}^{+}\) is biased due to the small event number in a different way from the estimated \(b\), which tends to be larger than true value for smaller event number (e.g., Ogata and Yamashina 1986), because \({b}^{+}\) is estimated using only earthquakes that are larger in magnitude than the previous earthquakes. The actual analysis results (colored points in Fig. 8) seem to be only slightly influenced by the effect of reduction of event number used to estimate \({b}^{+}\).

Not only the result of \(\eta\)-value analysis but also the above result that \({b}^{+}\) shows an upward convex trend of the FMD, unlike that expected from the GR law, raises an important issue. In the context of comparing pre- and post-earthquake activities and estimating stress-field changes through such activities, \(b\) is often estimated immediately after a large earthquake. If, for this purpose, we take a large value for \({M}_{\mathrm{th}}\) to avoid the effect of reduced detectability, the estimated \(b\)-value may be too high due to its upward convex shape, even if the FMD has not truly changed. In such a case, instead of simply assuming the ideal GR law and comparing the estimation results using different \({M}_{\mathrm{th}}\), the shape of the FMD needs to be fully considered. The present results suggest that in many cases it is preferable to analyze the data with a constant \({M}_{\mathrm{th}}\).

Comparison between anomalous seismicity and other physical phenomena

In this study, we used statistical analyses to identify spatial regions where the PDDs of index values for seismicity are significantly different from those in other regions. In other words, the seismicity in these regions may be considered to include unusual or "anomalous" seismicity, whereas regions where the PDDs are indistinguishable from those in other regions may be considered to experience "typical" seismicity. Hereafter, we use "anomalous" and "typical" in this sense. Specifically, when the \(p\)-value obtained by the analysis is below the threshold value of 0.05, activity is considered anomalous. The regions where such anomalous seismicity was mainly observed in the \(b\)-values (i.e., regions where the low \(p\)-values are spatially clustered) are indicated in Fig. 7a, b by arrows and letters (lower-case letters correspond to high \({f}_{\mathrm{lp}}\) and upper-case letters to low \({f}_{\mathrm{lp}}\)). Here, we first discuss the temporal variation of these activities, highlighting the case of \(l=0.4^\circ\) and \(N=50\) (Fig. 9). We then discuss anomalous index values shown in Fig. 7a–d by comparing them with different observation results and evaluating the universality of the typical seismicity obtained here.

Fig. 9
figure 9

Temporal variations and probability density distributions of \(b\) in regions of low \(p\)-values for \(N=50\) and \(l=0.4^\circ\). Horizontal bars indicate the analysis periods (from the first to the \(N\) th earthquake). a Low and b high \(b\)-value regions that may be related to large earthquakes in area A. c Regions of steadily low \(b\)-values in area A. d Regions of steadily high \(b\)-values in area A. e Low and f high \(b\)-value regions in area B. In a and b, the time variation is shown with respect to the elapsed time from the large earthquake that seems to have caused the largest effect on the seismic activity shown in the legends. The left and middle panels show variations before and after the large earthquakes, respectively. Numbers in square brackets in the legend in the right panels indicate the number of estimated index values. The regions correspond to those shown in Fig. 7. The arrows in c show the times when slow slip was detected near region L (off Boso Peninsula). PDDs in right panels in d and e exclude the activities in regions g and R (the Izu islands), respectively

Anomalous \({\varvec{b}}\)-values of shallow inland earthquakes (area A)

Figure 7a shows that many of the anomalous \(b\)-values are located near the epicenters of large earthquakes in area A (M6.1 or greater; C–H and a–d in Fig. 7a). Figure 9a, b shows the change of \(b\)-values with elapsed time since the occurrence of large earthquakes that were accompanied by the most aftershocks in these regions. The figures also show the PDDs of the anomalous \(b\)-values compared to those of typical \(b\)-values. Figure 9a, b separately shows data for regions of anomalously low and high \(b\)-values, respectively (hereafter referred to as the low and high \(b\)-value regions).

In Fig. 9a, although the change in \(b\)-value varies from event to event, collectively, the median and 10th and 90th percentile values (in black), which should reflect the change in PDD of \(b\), suggest that many low \(b\)-values are estimated for active aftershocks immediately after the mainshock. After that, there is a weak tendency for b-values to gradually increase, and at 107–108 s (months to years) after the mainshock, the distribution becomes equivalent to that of typical b-values. The decrease in \(b\) after 108 s corresponds to the occurrence of other large earthquakes and their aftershocks, again consistent with the above result that low \(b\)-values correspond to active aftershocks. It is noted that the very low \(b\)-values obtained when the analysis period (shown by the horizontal bar) includes an elapsed time of 0 or slightly precedes an event (spanning the left and middle boxes in Fig. 9a, b), especially when the right end of the horizontal bar is less than ~ 103 s, are cases where the estimation of \({M}_{\mathrm{c}}\) is difficult, as shown in Additional file 1: Fig. S5. In the present analysis, such cases cannot be completely excluded, and there are cases where \({M}_{\mathrm{c}}\) and thus the \(b\)-value are underestimated. However, the number of such cases is small and has little effect on the overall statistical analysis.

The activity in the high \(b\)-value regions associated with large earthquakes has a slightly higher overall distribution (Fig. 9b). Although it is not clear because of the small amount of data, the slightly higher average \(b\)-value may correspond to the longer interval between earthquakes in general, as seen from the fact that there are far fewer estimated b-values in Fig. 9b (476) than in Fig. 9a (3243). In this case, too, the \(b\)-value tends to increase with elapsed time. In “Common temporal variation of typical b-values” section, we show that this trend does not change even after minimizing the effect of detectability reduction resulting from the short time intervals of active aftershocks.

Some of the anomalous \(b\)-value regions in area A do not seem to be the result of large earthquakes in their vicinity. In these regions, the anomalous \(b\)-values have been observed stably for a long time. Figure 9c, d shows the temporal variation of \(b\)-values and PDDs for these stably low and high \(b\)-value regions, respectively. Among the examples of low \(b\)-values, seismicity in regions I, J, and K increased after the \(M\) 9.0 Tohoku-oki earthquake in March 2011, as can be seen from the many points after the earthquake, and the \(b\)-values remain low in all analysis periods. The stably low \(b\)-values in region L suggest the influence of slow slip because much of the analyzed data in region L are the hypocenters of the seismic activities that became active during the Boso slow slip events (e.g., Ozawa et al. 2019) in this region (the blue vertical arrows in Fig. 9c). The large proportion of such activated seismicity is seen from the fact that much of the analysis period in region L includes these slow slip events. The stably high \(b\)-value regions (Fig. 9d) include the Izu Islands (region g), where many earthquakes associated with volcanic activity, such as the eruption of Miyakejima in 2000, were observed during the analysis period (e.g., Toda et al. 2002); the Wakayama swarm area (h), where high \(b\)-values associated with high-temperature fluids have been reported (Yoshida et al. 2011); and the vicinity of Sakurajima (j), where volcanic activity is very active. In addition, the Yamagata-Fukushima border swarm (e), which became active after the Tohoku-oki earthquake and has been linked to hydrothermal fluids (e.g., Yoshida et al. 2019), was also extracted as a high \(b\)-value area, suggesting that seismicity in many of the high \(b\)-value regions with different FMD characteristics is related to the influence of fluids. In Fig. 9d, the PDD excludes the activity in region g because it is significantly more active than the other regions, but the distribution still has a peak at a much higher value than the PDD of the typical \(b\)-values.

Anomalous \({\varvec{b}}\)-values of all earthquakes in and around Japan (area B)

The temporal variation of the anomalous \(b\)-values in area B shows that seismicity in many of the low \(b\)-value regions increased after the Tohoku-oki earthquake (Fig. 9e). In regions where \(b\)-values were estimated before the Tohoku-oki earthquake, we can see that the \(b\)-values remained low after the Tohoku-oki earthquake. In these cases, region R in area B overlaps with region g in area A. However, region g shows a high \(b\)-value (with \({M}_{\mathrm{th}}=1.95)\), whereas region R shows a low \(b\)-value (with \({M}_{\mathrm{th}}=3.45)\). These results are consistent with the high \(\eta\)-values in region g (Fig. 7c), suggesting that there is some characteristic scale corresponding to a magnitude range larger than 1.95 that deviates from the GR law. In Fig. 9e, the PDDs also exclude seismic activity in region R. In contrast, the high \(b\)-value regions mostly include the area where activity increased after the Tohoku-oki earthquake (Fig. 9f). Again, the very low \(b\)-values obtained when the analysis period extends backward from March 2011 are probably due to underestimated \(b\)-values in cases where \({M}_{\mathrm{c}}\) is difficult to estimate, as shown in Additional file 1: Fig. S5. In the analysis period before the Tohoku-oki earthquake, the b-values in region m (near land) were similar to or slightly lower than the typical distribution, whereas those in region p (far offshore) were higher.

Compared with the area of large coseismic slip during the Tohoku-oki earthquake (Suito et al. 2012) shown in Fig. 7b with a green solid line, the high \(b\)-value regions (red in Fig. 7b) are distributed along the landward and seaward extension of the large-slip area, and the low \(b\)-value regions (blue in Fig. 7b) are distributed within on along strike from the extension of the large-slip area, suggesting a correspondence. Previous studies have pointed out a relationship between \(b\)-values and differential stresses, both from laboratory experiments and wide-area observations (Scholz 1968, 2015), and this relationship has been reported to manifest in the relationship between \(b\)-values and the slip deficit rate on a plate interface (Nanjo and Yoshida 2018) or in the relationship between \(b\)-values and styles of faulting (Schorlemmer et al. 2005). It is possible that the present analysis also captures changes in the \(b\)-value that reflect these effects.

To evaluate this possibility, we chose the average slip rate on the plate interface based on a similar earthquake catalog (Igarashi 2020) and the fault rake angle, which corresponds to style of faulting, as observational data that can be directly compared with our analysis. Igarashi (2020) constructed a catalog of similar earthquakes and small repeating earthquakes from 1981 to 2019 in central Japan and from 2001 to 2019 throughout the Japanese Islands and showed that there is little difference in the spatial distribution of average slip rates estimated from these two catalogs. Here, we use a similar earthquake catalog to estimate the average slip rate in the same rectangular regions used in the seismicity analysis. The method for estimating the slip rate is similar to that used by Igarashi (2010, 2020). The average slip rate in the rectangular region is estimated by taking the mean of the average slip rate of all similar earthquake groups in the region. The period for estimating the slip rate is from the last event before 2000 (or the first after 2000 if there were none before) to the last event in the same group. In the estimation, the empirical relationship between magnitude and the amount of coseismic slip by Nadeau and Johnson (1998) is used. We note that the analysis period includes the Tohoku-oki earthquake and its afterslip and therefore includes much non-stationary slip.

Figure 10a shows the comparison of the estimated b-values to the logarithm of the estimated average slip rate. All results for each pair of \(l\) and \(N\) are shown together. Darker symbol colors correspond to results with high ratio of the number of similar earthquakes \({n}_{\mathrm{similarEQ}}\) to \(N\), that is, results for data including many interplate events. In these plots, the \(b\)-value is positively correlated with the average slip rate. Similarly, Nanjo and Yoshida (2018) showed a negative correlation between the \(b\)-value and the slip deficit rate in the Nankai Trough. In the present analysis, the \(b\)-value appears to be linearly correlated with the logarithm of the average slip rate, unlike the linear correlation between \(b\)-values and slip deficit rates in Nanjo and Yoshida (2018), probably due to the inclusion of the afterslip of the Tohoku-oki earthquake. Despite such differences, these results are qualitatively consistent in terms of the relationship between \(b\)-values and interplate coupling. The present analysis, where most target activity is along the Japan Trench, suggests that the \(b\)-values tend to be higher or lower in region with weak or strong interplate coupling, respectively, and regions of significantly higher and lower \(b\)-values are detected statistically as anomalous \(b\)-values. The \(b\)-values of seismicity without similar earthquakes, most of which are considered to be intraplate earthquakes, show no clear correlation with the average slip rate.

Fig. 10
figure 10

Relationship between a \(b\) and b \(\eta\) and the average slip velocity estimated on the same spatial grid. All results for each \(N\) and \(l\) are shown together. Red, blue, and green symbols indicate high, low, and typical \(b\)- or \(\eta\)-values, respectively, and the ratio of similar earthquakes to the number of analyzed hypocenters (\(N\)) ≥ 0.5 is highlighted

Fault rake angle, which corresponds to the styles of faulting, is another factor expected to correlate with \(b\)(Schorlemmer et al. 2005). To examine the relationship, we extracted the rake angles between \(-90^\circ\) and \(90^\circ\) of all of the earthquake mechanisms of F-net catalog (Kubo et al. 2002) whose epicenters are in the spatiotemporal domain used for the estimation of index values. Rake angles of about \(-90^\circ\) indicate normal faulting, \(0^\circ\) strike-slip faulting, and \(90^\circ\) reverse faulting. Figure 11a shows the PDDs of rake angles extracted in the high, low, and typical \(b\)-value regions north of \(34.5^\circ\) N. The latitudinal range was set to avoid the area around the Izu Islands (area R in Fig. 7b), where many earthquakes associated with volcanic activity occur, and to include activities with a clear regional difference between high and low b-value regions. The results for all pairs of \(l\) and \(N\) plotted in the figure show no significant difference. Figure 11a shows that, in the low \(b\)-value regions (blue lines), the proportion of normal faults is small and most earthquakes are reverse faults, whereas, in the high \(b\)-value region (red lines), the proportion of reverse faults is small and the number of normal faults is relatively large. This result is consistent with that of Schorlemmer et al. (2005), who found that reverse-fault earthquakes have low \(b\)-values and normal-fault earthquakes have high \(b\)-values. As mentioned previously, the \(b\)-values of seismicity include many interplate earthquakes, most of which are reverse-fault earthquakes and are related to not only the style of faulting but also the slip rate. The results of other intraplate earthquakes are consistent with the conventional idea that they correspond to the stress field that causes differences in the style of faulting.

Fig. 11
figure 11

Probability density distribution of rake angles of earthquakes for a \(b\) and b \(\eta\). Red, blue, and green lines correspond to high, low, and typical values, respectively

Anomalous \({\varvec{\eta}}\)-values

The anomalous \(\eta\)-value regions appear to be near the anomalous \(b\)-value regions (Fig. 7a–d). Figure 12 shows the proportions of high, low, and typical \(\eta\)-value regions in the high, low, and typical \(b\)-value regions, respectively, for \(N=50\) and \(l=0.4^\circ\). No clear relationship is apparent between the characteristics of the \(b\)- and \(\eta\)-value distributions in each region. Although there are some regions where both of the \(b\)-value and \(\eta\)-value PDDs are anomalous, such as in region g (Fig. 7a), which has a high \(b\)-value and a high \(\eta\)-value, such anomalies probably reflect regional characteristics.

Fig. 12
figure 12

Proportions of high, low, and typical \(\eta\)-value regions in the high, low, and typical \(b\)-value regions for \(N=50\) and \(l=0.4^\circ\). a Area A; b area B

Figure 13 shows the relationship between each \(b\)-value and \(\eta\)-value. The anomalous \(b\)- and \(\eta\)-value regions (purple dots) may show some correlation, such as with high \(b\)-values and high \(\eta\)-values (Fig. 13a), which probably reflects the regional characteristics of region g. However, if we look at the plot of typical \(b\)- and \(\eta\)-value regions (green dots), which excludes the possibility of the influence of such regional characteristics, these index values seem to be uncorrelated. Moreover, the correlation coefficient between typical \(b\)-values and typical \(\eta\)-values is 0.02 for area A and 0.11 for area B, indicating no significant correlation.

Fig. 13
figure 13

\(b\) versus \(\eta\) for \(N=50\) and \(l=0.4^\circ\) in a area A and b area B. Green and purple dots indicate the cases where the PDDs of the both indices are typical and where the PDDs of one of the two indices are anomalous, respectively

It is often pointed out that mixing distinct types of seismicities with different characteristic \(b\)-values may cause deviations from the GR law (e.g., Wiemer and Wyss 2000). However, it is difficult to know the true b-value for each seismicity in advance, so one way for monitoring seismicity without making arbitrary assumptions is to observe seismicity from multiple index values, such as the \(\eta\)- and \(b\)-values. In particular, because the \(b\)- and \(\eta\)-values are basically uncorrelated, simultaneously monitoring them should improve the accuracy of anomaly detection.

Although we found no clear cause for a change in the \(\eta\)-value, it may be meaningful to look at the relationship between the \(\eta\)-value and interplate coupling or type of faulting. Figure 10b shows the relationship between the \(\eta\)-value and the average interplate slip rate. It is difficult to find a systematic correspondence in this figure, but the \(\eta\)-values are particularly low when the average slip rate is very high (> 400 mm/yr, much higher than the plate convergence velocity), most likely because of the afterslip of the Tohoku-oki earthquake. In these locations, the b-values are large (Fig. 10a), indicating that the number of large earthquakes is very small relative to the number of small earthquakes. Similarly, Vorobieva et al. (2016) compared the shape of the FMD and the creep rate along the San Andreas fault and showed that the FMD tends to become more convex upward as the creep rate increases. One interpretation is that, in the vicinity of areas sliding at sufficiently high velocities compared to the relative motion between the plates, small earthquakes are more likely to occur because of localized stress concentration without widespread stress accumulation, whereas relatively large earthquakes are less likely to occur.

In Fig. 11b, which shows the frequency distribution of rake angles in the high, low, and typical \(\eta\)-value regions, the low \(\eta\)-value regions (blue lines) have a relatively large proportion of reverse-fault-type earthquakes. Nevertheless, there are also many reverse-fault-type earthquakes in the high \(\eta\)-value regions (red lines), and no simple relationship was found. Normal-fault earthquakes seem to be relatively common in the high \(\eta\)-value regions.

Probability density distribution of the index values of normal seismicity

Common temporal variation of typical b-values

The present analysis allowed us to extract activities in regions where there was no significant difference in the PDDs of index values compared to those of most other areas. These activities may include temporal changes in the index values that are common throughout the analysis area. Here, as the only example where a systematic dependence of the \(b\)-value was found, we highlight the relationship between \(b\) and the earthquake occurrence interval in the typical \(b\)-value regions.

Because \(b\) is an average property of the \(N\) analyzed hypocenters, we used \(\mathrm{min}{T}_{N/4}\), which was used to analyze \(D\)-values, as an index of the characteristics of the earthquake occurrence interval of those \(N\) earthquakes for comparison with \(b\). Figure 14 shows the relationship between \(b\) and \(\mathrm{min}{T}_{N/4}\). The results for area A (Fig. 14a) show that the entire \(b\)-value distribution increases with \(\mathrm{min}{T}_{N/4}\). The slope decreases as \(\mathrm{min}{T}_{N/4}\) increases, becoming almost constant above \(\sim {10}^{7}\) s and slightly decreasing near \(\sim {10}^{8}\) s. A small \(\mathrm{min}{T}_{N/4}\) reflects active aftershocks, and \(\mathrm{min}{T}_{N/4}\) increases with aftershock decay. Therefore, the relationship between \(b\) and \(\mathrm{min}{T}_{N/4}\) is consistent with the temporal variations of \(b\) in the anomalous \(b\)-value regions near the epicenters of large earthquakes in area A, where \(b\) is low for initial active aftershocks and increases with increasing aftershock decay (Fig. 9a, b). Many of these anomalous \(b\)-value regions near epicenters of large earthquakes tend to be in the low \(b\)-value regions, where many aftershocks occur over short time intervals (Fig. 9a), whereas regions that have relatively fewer aftershocks with shorter time intervals are extracted as high \(b\)-value regions (Fig. 9b). The results for area B (Fig. 14b) also show a similar positive correlation between \(b\) and \(\mathrm{min}{T}_{N/4}\) in the range of 104–107 s. Relatively high median b-values for \(\mathrm{min}{T}_{N/4}\) smaller than 104 s seem to be due to a lack of low \(b\)-values. The relationship between \(b\) and \(\mathrm{min}{T}_{N/4}\) becomes an inverse correlation at \(\mathrm{min}{T}_{N/4}\) > 107 s. In the subduction zone included only in area B, large earthquakes occur over shorter time intervals than those on inland active faults in area A. The \(\mathrm{min}{T}_{N/4}\) value at which the slope changes in Fig. 14a, b might be related to the period required for the transition from aftershock decay to the accumulation of stresses before the next large earthquake. The physical interpretation of the relationship between \(b\) and \(\mathrm{min}{T}_{N/4}\) obtained here and the difference in the results for areas A and B are very interesting, but are beyond the scope of this paper, so further detailed discussion of these issues is left for future works.

Fig. 14
figure 14

Relationship between \(b\) and \(\mathrm{min}{T}_{N/4}\) in a region with a typical \(b\)-value PDD. Squares represent the median values in the range of 0.5 to 2 times the horizontal axis values, and the vertical bars represent the 90th and 10th percentiles. a Area A; b area B

Regarding the relationship between \(b\) and \(\mathrm{min}{T}_{N/4}\), there is still a concern that small earthquakes may be less likely to be detected over shorter time intervals between successive events. In fact, some of the \(b\)-values are extremely small, especially for small \(\mathrm{min}{T}_{N/4}\) in area A. Although the number is small, there is a possibility that the analysis includes data with \({M}_{\mathrm{c}}>{M}_{\mathrm{th}}\). For this reason, we also checked the relationship between \(\mathrm{min}{T}_{N/4}\) and \({b}^{+}\) (\({M}_{\mathrm{min}}={M}_{\mathrm{th}}-0.5\), \({m}_{\mathrm{th}}=0.2\)) in the same way (Additional file 1: Fig. S7). Similar to Fig. 14, Additional file 1: Fig. S7 also shows a weak positive correlation between \({b}^{+}\) and \(\mathrm{min}{T}_{N/4}\) except in the case of very small \(\mathrm{min}{T}_{N/4} (<300 \mathrm{s})\) for area A (Additional file 1: Fig. S7a) and large \(\mathrm{min}{T}_{N/4}\) (> 107 s) for area B (Additional file 1: Fig. S7b). It is noted that \({b}^{+}\) would be larger for lower detectability of earthquakes when FMD is convex upward as probably seen in the case of very small \(\mathrm{min}{T}_{N/4}\) for area A (Additional file 1: Fig. S7a).

Typical probability density distributions of b, η, and D

The typical PDDs of the index values obtained from seismicity during the past 20 years in and around Japan are expected to be applicable as a representation of the characteristics of typical seismicity. For example, when monitoring future seismicity and detecting anomalies, it would be straightforward to use the typical PDDs of index values as a reference to quantify the degree of anomaly.

In the present study, to perform analyses that are as comprehensive as possible with a finite amount of data, the statistical tests were performed for eight non-overlapping patterns of gridding for four different pairs of \(l\) and \(N\). As a result, eight typical PDDs, estimated from the independent data and thus available for statistical tests, were obtained for each pair of \(l\) and \(N\) (Figs. 15, 16, 17). These PDDs depend on \(N\), but not on \(l\), and have variations according to the number of index values obtained by the analyses (maximum for \(N=50, l=0.4^\circ\) in area A, minimum for \(N=100, l=0.2^\circ\) in area B). Therefore, the FMD and timing of earthquake occurrence related to the tidal response seen through these indices have certain characteristics regardless of the location or spatial scale, and these results may be explained in a unified manner as simply having index values that vary according to the number of analyses, \(N\). If such a feature can be modeled, it will be useful, especially when applied to anomaly detection.

Fig. 15
figure 15

Typical probability density distributions of \(b\), showing all eight patterns of non-overlapping data acquisition for each pair of \(N\) and \(l\). The open and closed symbols with the same shapes and colors indicate that they are the result for the same spatial gridding but event number window shifted half. The black solid lines and gray bars show the PDD and its variability estimated by the model shown in the text. The dashed black lines are the PDDs expected from the GR law for a constant \(b\)-value of 0.9. a \(N=50\) in area A; b \(N=100\) in area A; c \(N=50\) in area B; d \(N=100\) in area B

Fig. 16
figure 16

Same as Fig. 15, but for \(\eta\). a \(N=50\) in area A; b \(N=100\) in area A; c \(N=50\) in area B; d \(N=100\) in area B

Fig. 17
figure 17

Same as Fig. 15, but for \(D\). The dashed black lines are the PDDs expected when no tidal correlation exists. a \(N=50\) in area A; b \(N=100\) in area A; c \(N=50\) in area B; d \(N=100\) in area B

Therefore, in the following section, we present simple models for the FMD and tidal correlation of seismicity that explain the typical PDDs of \(b\), \(\eta\), and \(D\).

Simple models explaining the observed typical PDDs

Frequency-magnitude distribution

The PDD of \(b\), which is the value obtained from Eq. (2) and corresponds to the slope of the FMD around \(M={M}_{\mathrm{th}}\), is not very different from that expected from the GR law with constant \(b\)-values (black dashed lines, Fig. 15). This means that the range of variation of the true \(b\)-value is not very wide in most of the analyzed seismicity (it rarely varied by more than \(\pm 0.1\) in area A or \(\pm 0.2\) in area B). In contrast, the PDD of the typical \(\eta\)-value (Fig. 16) is lower than the distribution expected from the GR law. This PDD shift does not indicate a deviation from the GR law only for large magnitude events, as often modeled (e.g., Hirose et al. 2019b); rather, it reflects the shape of the upward convex FMD whose slope gradually changes with increasing magnitude.

We use the equation proposed by Lomnitz-Adler and Lomnitz (1979) (hereafter referred to as the L-L formula) as a model to express the convex shape of the FMD for a small number of earthquakes. In the L-L formula, the number of earthquakes above a certain magnitude \(M\) is described as

$$\begin{array}{c}\log N\left(M\right)=A-c \exp\left(HM\right) ,\end{array}$$

where \(A,c,\) and \(H\) are positive parameters. The physical background of the derivation of this equation is not considered here; the reason for its adoption is simply that it is suitable for describing the observations. Because the slope of the FMD varies according to both \(M\) and \(H\) in this equation, it is convenient to transform it using the slope \(b^{\prime }\left( {M_{{{\text{th}}}} } \right)\) at \(M={M}_{\mathrm{th}}\) for comparison with the observed \(b\)-value. That is

$$\begin{array}{c}\log N\left(M\right)=A-\frac{{b}{^{\prime}}\left({M}_{\mathrm{th}}\right)}{H} \exp\left({H}\cdot \left(M-{M}_{\mathrm{th}}\right)\right),\end{array}$$
$$\begin{array}{c}{b}{^{\prime}}\left({M}_{\mathrm{th}}\right)=\frac{d\log N}{dM}\left({M=M}_{\mathrm{th}}\right)=cH \exp\left(B{M}_{\mathrm{th}}\right) .\end{array}$$

In Eqs. (8) and (9), \(b{^{\prime}}\left({M}_{\mathrm{th}}\right)\) and \(H\) correspond to the observed \(b\)- and \(\eta\)-values, respectively. From the results of the present analysis, it is considered that \(b\)-values fluctuate within a narrow range. Hence, \(b{^{\prime}}\left({M}_{\mathrm{th}}\right)\) is assumed to be normally distributed. From the present observations of the \(\eta\)-values and from the general observation that the GR law is almost valid, it seems that \(H\) fluctuates slightly and has a small positive value. Hence, a lognormal distribution is assumed here. In this case, the probability density functions of \({b}{^{\prime}}\) and \(H\) can be expressed as

$$\begin{array}{c}f\left({b}{^{\prime}}\right)=\frac{1}{\sqrt{2\pi {\sigma }_{{b}{^{\prime}}}^{2}}}\exp\left(-\frac{{\left({b}{^{\prime}}-{\mu }_{{b}{^{\prime}}}\right)}^{2}}{2{\sigma }_{{b}{^{\prime}}}^{2}}\right),\end{array}$$
$$\begin{array}{c}f\left(H\right)=\frac{1}{\sqrt{2\pi }{\sigma }_{H}{H}}\exp \left(-\frac{{\left(\ln H - {\mu }_{H}\right)}^{2}}{2{\sigma }_{H}^{2}}\right) ,\end{array}$$

where \({\mu }_{{b}{^{\prime}}}\) and \({\sigma }_{{b}{^{\prime}}}\) are the mean and standard deviation of \({b}{^{\prime}}\), respectively, and \({\mu }_{H}\) and \({\sigma }_{H}\) are the mean and standard deviation of \(\ln B\), respectively.

Here, we searched for the values of\({\mu }_{{b}{^{\prime}}}\),\({\sigma }_{{b}{^{\prime}}}\),\({\mu }_{H}\), and \({\sigma }_{H}\) by repeating the generation of \(N\) \(M\)-sequences to estimate the \(b\)- and \(\eta\)-values for each combination of search parameters 30,000 times so that the PDDs of the \(b\)- and \(\eta\)-values fit the observed ones well. The goodness of fit was determined by the weighted least squares method, which considers the variance of the observed values in each bin as the observation error and the inverse of the variance as the weight. That is, when the estimated value of the \(k\)-th bin is \({g}_{k}\) and the observed value is\({y}_{kn}\), where \(k(=\mathrm{1,2},\dots ,\mathrm{ K})\) corresponds to the bins of PDDs shown in Figs. 15a, b and 16a, b for area A and Figs. 15c, d and 16c, d for area B, excluding the bin with probability density of 0, and the number of observed values in each bin is \(n(=\mathrm{1,2}, \dots , 8)\), the combination of parameters that minimizes the following equation is obtained:

$$\begin{array}{c}{S}_{w}=\frac{1}{K}{\sum }_{k}\frac{1}{8}{\sum }_{n}\frac{{\left({g}_{k}-{y}_{kn}\right)}^{2}}{{\sigma }_{k}} ,\end{array}$$

where \({\sigma }_{k}\) is the variance of \({y}_{kn}\) in each bin.

The results of a grid search around the obtained parameter values are shown in Fig. 18. Although the parameter values range slightly, the values can be almost uniquely obtained as \({\mu }_{{b}{^{\prime}}}=0.875\), \({\sigma }_{{b}{^{\prime}}}=0.09\), \({\mu }_{H}=-2.7\), and \({\sigma }_{H}=0.2\) for area A, where \({M}_{\mathrm{th}}=1.95\), and \({\mu }_{{b}{^{\prime}}}=0.75\), \({\sigma }_{{b}{^{\prime}}}=0.105\), and \({\mu }_{H}=-1.35\), \({\sigma }_{H}=0.75\) for area B, where \({M}_{\mathrm{th}}=3.45\). In Figs. 15 and 16, the PDDs of 30,000 \(b\)- and \(\eta\)-values estimated in the simulation using these parameters are shown as black lines. The gray bars show the range in which 90% of the 10,000 PDDs estimated for almost the same number of \(b\)- and \(\eta\)-values as observed. These results show that using the estimated parameter values with Eqs. (911) reproduces well all observed PDDs of \(b\) and \(\eta\) simultaneously.

Fig. 18
figure 18

Grid search results for parameters related to the distribution of parameters in the L-L formula (Eq. 7) representing the FMD. The “colder” the color, the better the fit. a Grid search results of \({\mu }_{{b}{^{\prime}}}\), and \({\sigma }_{{b}{^{\prime}}}\) in area A; \({\mu }_{H}=-2.7\) and \({\sigma }_{H}=0.2\) are fixed. b Grid search results of \({\mu }_{ H}\) and \({\sigma }_{ H}\) in area A; \({\mu }_{{b}{^{\prime}}}=0.875\) and \({\sigma }_{{b}{^{\prime}}}=0.09\) are fixed. c Grid search results of \({\mu }_{{b}{^{\prime}}}\) and \({\sigma }_{{b}{^{\prime}}}\) in area B; \({\mu }_{ H}=-1.3\) and \({\sigma }_{H}=0.75\) are fixed. d Grid search results for \({\mu }_{H}\) and \({\sigma }_{H}\) in area B; \({\mu }_{{b}{^{\prime}}}=0.75\) and \({\sigma }_{{b}{^{\prime}}}=0.105\) are fixed

The above model reproduces the characteristics of the FMD expressed using two indices, \(b\) and \(\eta\), but does not guarantee that the FMD itself is well represented by the L-L formula (Eq. 7). However, the cumulative frequency distribution of all magnitudes generated by the proposed model is consistent with the overall observed FMD (Fig. 19), and the L-L formula seems to be a good choice for representing the entire distribution in a unified formula. It should also be emphasized that, as shown in Fig. 19, even if individual activities that consist of \(N\) events have a convex shape, the linear frequency distribution expressed by the GR law is almost reproduced when all events are grouped together.

Fig. 19
figure 19

Comparison of the observed FMDs with those expected from the L-L formula (Eq. 7) using the best-fit parameters. \(M\) data corresponding to the typical \(b\)-value and typical \(\eta\)-value PDDs are used for each \(N\) and \(l\) pair. a Area A; b area B

Tidal correlation

As mentioned above, when seismic activity is uncorrelated with the tidal response, the PDD of \(D\) is expressed as in Eq. (5). The actual PDD of \(D\) that we obtained is generally slightly larger than that expected from Eq. (5), even if we exclude seismicity with anomalous PDDs (Fig. 17). Here, we show that these slightly larger \(D\)-values can be explained by considering the sequential nature of earthquakes.

In the present analysis, the condition of \(\mathrm{min}{T}_{N/4}<\mathrm{21,600}\; \mathrm{s}\) is used to exclude in advance seismic activity clustered within a short period relative to the tidal cycle (i.e., high \(D\)-value activity resulting from aftershocks). However, the above conditions do not account for the effect of several earthquakes occurring in succession. As shown by the Omori-Utsu law, the probability of aftershocks is highest immediately after a previous earthquake, and it is often the case that several earthquakes are observed in succession.

In order to account for the effect of such a succession of earthquakes, we assume that \({N}{^{\prime}}=rN\) (\(0<r\le 1\)) of the observed \(N\) earthquakes are uncorrelated with the tide and occur at intervals sufficiently longer than the tidal period, whereas the remaining \(N-{N}{^{\prime}}\) earthquakes are successive to the previous one with a sufficiently shorter interval than the tidal period. In this case, as in Eq. (4), the \(D\)-value of \({N}{^{\prime}}\) earthquakes is described as

$$\begin{array}{c}{D}{^{\prime}}=\left\{{\left({\sum }_{j=1}^{{N}{^{\prime}}}\mathrm{cos}{\theta }_{j}\right)}^{2}+{\left({\sum }_{j=1}^{{N}{^{\prime}}}\mathrm{sin}{\theta }_{j}\right)}^{2}\right\}^{1/2} ,\end{array}$$

where \({\theta }_{j} \left(j=1, 2,\dots ,N{^{\prime}}\right)\) is the tidal phase angle of the \(j\)-th event. The PDD of \({D}{^{\prime}}\) is approximated as


Assuming that the remaining \(N-{N}{^{\prime}}\) earthquakes are successive and their phase angles are approximately equal to those of their preceding earthquakes, the relationship between \(E\left[D\right]\), the expected value of \(D\) estimated by Eq. (5), and \(E\left[D{^{\prime}}\right]\) estimated by Eq. (14) can be approximated by \(E\left[D\right]={\int }_{0}^{\infty }Df(D)dD\approx E\left[D{^{\prime}}\right]/r={\int }_{0}^{\infty }D{^{\prime}}f(D{^{\prime}})dD{^{\prime}}/r\). The following PDD of \(D\) satisfies this relationship:

$$f\left(D\right)\approx \frac{2rD}{N}\mathrm{exp}\left(-\frac{r{D}^{2}}{N}\right).$$

In other words, when considering subsequent occurrences, the PDD of \(D\) is represented by a Rayleigh distribution that becomes wider than expected from Eq. (5) as \(r\) becomes smaller.

Here, we searched for the value of \(r\) by repeating the generation of \(N\) \(D\)-values expected from Eq. (15) for each \(r\) 30,000 times so that the PDD of \(D\) fits well with the observed one. That is, \(r\) is estimated by minimizing \({S}_{w}\) in Eq. (12), where \({g}_{k}\) is taken from the PDD of \(D\)-values obtained in the simulations, \({y}_{kn}\) is taken from the observed PDDs (colored symbols in Fig. 17a, b for area A and Fig. 17c, d for area B), and \(k\) is the number of bins for which the probability density of the observed \(D\)-value is non-zero.

The results of a grid search around the obtained parameter values are shown in Fig. 20; we estimate that \(r=0.67\) for area A and \(r=0.71\) for area B. Within the range 0.6–0.7 for area A and 0.65–0.8 for area B, there is no significant difference from the observations. Figure 17 shows the PDDs of 30,000 \(D\)-values estimated in the simulation using \(r=0.67\) or \(r=0.71\) as black lines. The gray bars show the range in which 90% of the 10,000 PDDs estimated for almost the same number of \(D\)-values as observed. The results show that, by using the estimated value of \(r\) with Eq. (15), all observed PDDs of \(D\) can be explained well. The model PDD is, however, rather wide for data with \(N=50\), and strictly speaking, it may be better to change the model depending on the value of \(N\). However, considering the error range of the observation results, the same model appears to be a sufficient approximation for \(N = 50-100\).

Fig. 20
figure 20

Grid search results for the parameter \(r\). Vertical axis shows \({S}_{w}\) as defined in Eq. 12

The estimated \(r\) values suggest that earthquakes that are followed by other earthquakes within an interval shorter than the tidal cycle occur about 30–40% of the time in area A and 20–30% of the time in area B for earthquakes that satisfy \(\mathrm{min}{T}_{N/4}\ge \mathrm{21,600}\; \mathrm{s}\). Figure 21 shows the average ratio of successive events that occurred within 3 h of the previous event (i.e., less than 1/4 of the main tidal cycle of about 12 h) to \(N\) in the typical, high, and low \(D\)-value regions. The values in the typical \(D\)-value regions, about 0.3 for area A and about 0.2 for area B, are consistent with the corresponding values obtained from the model. In area A, the proportion of subsequent earthquakes is lower in the case of \(N\) = 50 than in the case of \(N\) = 100, which may be why we obtained a slightly wider PDD in the case of \(N\) = 50 when the two cases were modeled together. Furthermore, the proportion of sequential earthquakes is higher in the high \(D\)-value regions and lower in the low \(D\)-value regions. Therefore, the PDDs of \(D\)-values are understood to basically depend on the proportion of sequential earthquakes, and that slight changes in the PDD resulting from the frequency of sequential earthquakes appear as anomalous \(D\)-values.

Fig. 21
figure 21

Ratio of earthquakes that occurred within 3 h after another earthquake in the typical, high, and low \(D\)-value regions to N. a Area A; b area B

In this analysis, we could not find any tidal correlation, but if a real tidal correlation exists, it should be possible to detect it as an anomaly in the \(D\)-value by using the typical PDD of \(D\) obtained here as a reference. In this case, however, it is necessary to pay attention to whether there is an extreme increase or decrease in the number of subsequent earthquakes in an interval shorter than the tidal cycle. The \(D\)-value, also, may be useful as an index to understand the increase or decrease of subsequent earthquakes during such short intervals.

Finally, we show the relationship between \(D\) and \(b\) or \(\eta\) (Fig. 22). Because the \(b\)-values also show a weak dependence on the time interval of earthquake occurrence (Fig. 14), one might expect a very weak negative correlation between \(b\) and \(D\). However, there is no clear correlation in any of the plots in Fig. 22. This is probably due to the fact that we excluded in advance seismic activities with clearly short earthquake intervals (\(\mathrm{min}{T}_{N/4}<\mathrm{21,600}\; \mathrm{s}\)) in our analysis of the \(D\)-values. Therefore, these three index values can be treated as independent, at least for typical activity as defined here.

Fig. 22
figure 22

Relationships between \(D\) and a, b \(b\) and c, d \(\eta\) for \(N=50\) and \(l=0.4^\circ\). (Left) Area A; (right) area B. Black and gray dots indicate the cases where the PDDs of the both indices are typical and where the PDDs of one of the two indices are abnormal, respectively


We analyzed seismicity in and around Japan during the past 20 years through index values representing the characteristics of the FMD and the tidal correlation of seismicity to extract regions of typical seismicity where the PDDs of the index values are indistinguishable from those in most of the other analysis regions, as well as regions in which the seismicity is considered to be anomalous.

The FMD showed significant regional anomalies. In addition to the aftershocks of large earthquakes, which are clearly recognized as anomalous seismicity, seismicity with anomalous \(b\)-values consistent with the \(b\)-value fluctuations corresponding to fluid involvement, interplate coupling, and faulting styles were recognized, all of which are consistent with observations of previous studies. These results suggest that anomalous index values whose PDDs differ from the typical one were successfully extracted. In addition, we found a weak correlation between \(b\)-value and temporal earthquake density over the entire analysis area. The FMD of typical seismicity has an upward convex trend in general, and this trend is more pronounced for seismicity in the wider analysis area that included offshore areas. These characteristics of the FMDs are fairly well explained by the L-L formula. It remains as a future task to investigate these features, which are found as common characteristics only when a large number of seismic activities are comprehensively analyzed, to clarify the physical background and improve the model.

There is no significant correlation between the tidal response of volumetric strain and typical seismicity, and the PDD of correlation parameters is well explained by accounting for seismicity occurring over short time intervals relative to the tidal cycle.

By using the typical PDD of index values obtained from past data and the model based on them as a reference, it is possible to quantify the degree of anomaly of newly obtained seismicity data by using the statistical method described here. Such a method is expected to be applicable for monitoring seismic activity. This statistical method is applicable not only to FMDs and tidal correlations but also to any indices that have characteristic distributions. More multifaceted analyses will help us better understand complex seismicities.

Table 1 Mathematical symbols related to magnitude and number of events

Availability of data and materials

The hypocenter catalog used in this study is available from the JMA [,]. The F-net moment tensor catalog used in this study is available from the NIED (National Research Institute for Earth Science and Disaster Resilience) [].


BM test:

Brunner-Munzel test

GR law:

Gutenberg-Richter law


Entire magnitude range


Frequency-magnitude distribution

KS test:

Kolmogorov–Smirnov test

L-L formula:

Lomnitz-Adler and Lomnitz formula


Maximum curvature

M c :

Completeness magnitude

M th :

Magnitude threshold


Probability density distribution


Download references


We would like to thank the project members of Research on Monitoring and Forecasting of Earthquakes and Tsunamis at MRI for their helpful comments. This manuscript was greatly improved by careful reviews of two anonymous reviewers. We thank JMA for providing the hypocenter catalog. We also thank NIED for providing the moment tensor catalog. This study was supported by the Ministry of Education, Culture, Sports, Science and Technology (MEXT) of Japan, under The Second Earthquake and Volcano Hazards Observation and Research Program (Earthquake and Volcano Hazard Reduction Research).


This work was supported by MRI (Meteorological Research Institute, Japan Meteorological Agency).

Author information

Authors and Affiliations



KN contributed to the conception and design of the study, development of the approach, statistical analysis, modeling and simulation, and drafted the manuscript. KT contributed to the hypocenter data acquisition and analysis of the FMD. FH contributed to the design of the study and analysis of the tidal correlation of seismicity. AN contributed to the interpretation of the results. All the authors read and approved the final manuscript.

Corresponding author

Correspondence to Kohei Nagata.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests associated with this manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: Fig. S1

Example analysis in a single spatial region. a Location of the spatial domain shown in the example. b The rectangular target region of the analysis, showing the cases of \(l=0.2^\circ\) (cyan) and \(l=0.4^\circ\) (magenta). c Time series of \(M\) and cumulative number of earthquakes of \(M\ge 2.0\) in the target region. d FMD. Symbols and solid lines indicate non-integration and integration of larger magnitude earthquakes, respectively. eg Time series of estimated \(b\)-, \(\eta\)-, and \(D\)-values, respectively. In cg, symbol colors correspond to the rectangular regions shown in b. Fig. S2 Schematic of the analysis window. a Spatial window. When the spatial windows are shifted by half, there are four different patterns of non-overlapping windows for the same \(l\) as shown with red, blue, green, and orange solid lines for \(l=0.2^\circ\). In the case of \(l=0.4^\circ\), there are also four different patterns of non-overlapping windows though here only one pattern is plotted (red broken line). The center coordinates in the case of \(l=0.4^\circ\) (gray circle) are identical every other to that in the case of \(l=0.2^\circ\) (black cross). B Temporal window. The width of each window is the time it took for \(N\) earthquakes to occur (i.e., there is no analysis window in spatial windows where no more than 50 earthquakes occurred). The red and blue plots show the cases of \(N=50\) and \(N=100\), respectively. There are two different patterns of non-overlapping windows (shown with circles and triangles) for the same \(N\). In each spatiotemporal window, the index values (\(b\), \(\eta\), \(D\)), the parameters for criteria for use of index values (\({M}_{\mathrm{c}}\), \(\mathrm{min}{T}_{N/4}\)), and the parameter used in discussions (\({b}^{+}\)) are calculated. Fig. S3 Probability density distributions of index values for the FMD expected from the GR law. One million \(M\)-series data were generated for each of the four different true \(b\)-values using random numbers. Each index value was calculated by extracting \(N\) values in order. A PDDs of observed \(b\)-values normalized by the true \(b\)-value. This distribution is independent of the true \(b\)-values and depends only on \(N\). B PDDs of \(\eta\)-values. This index value is independent of \(b\) and depends only on \(N\) without normalization. Fig. S4 The expected relationship between \(D\), normalized by \(N\), and \(\mathrm{min}{T}_{N/4}\) in the case of a stationary Poisson distribution. The period of the tidal response is assumed to be constant at 12 h, and the mean occurrence interval and initial phase are randomly chosen to generate the occurrence time series following a stationary Poisson distribution. \(D\) and \(\mathrm{min}{T}_{N/4}\) were estimated 100,000 times each for \(N=50\) and \(N=100\). The vertical dashed line marks 21,600 s (6 h). Fig. S5 An example of a FMD with which it is difficult to estimate \({M}_{\mathrm{c}}\). The inset shows the corresponding \(M-t\) plot. A large earthquake and its early aftershocks are included at the end of the analysis period, resulting in an unusually small \(b\)-value (0.34 with \({M}_{\mathrm{th}}=1.95)\) due to the effect of reduced detectability. When the analysis period is set uniformly, as in this study, it is necessary to devise a method to discriminate such a partial decrease in detectability during the analysis period. Fig. S6 Same as Fig. 4, but using the EMR method to estimate \({M}_{\mathrm{c}}\). Fig. S7 Same as Fig. 14, but for the relationship between \({b}^{+}\) and \(\mathrm{min}{T}_{N/4}\). The relatively high \({b}^{+}\)-value for very small values of \(\mathrm{min}{T}_{N/4}\) (\(<300 \mathrm{s}\)) in a is probably a result of the decrease in detectability (i.e., large \({M}_{\mathrm{c}}\)) with the upward convex FMD.

Additional file 2.

Table of estimated index values in area A for \(l=0.2^\circ\) and \(N=50\). Table of estimated index values in area A for \(l=0.2^\circ\) and \(N=50\). Center coordinate of spatial window (lat, lon), period of temporal window (ts–te), estimated index values \(b\) (b), \(\eta (\mathrm{eta})\), \(D\) (D), and other estimates, such as \({b}^{+}\) (bp), \(\mathrm{min}{T}_{N/4}\) in second (min_quarterNperiod), \({M}_{\mathrm{c}}\) estimated with the MAXC and EMR method (Maxc, Mc_emr), number of successive events that occurred within 3 h of the previous event (n_dt_lt_3h), the largest \(M\) in the analysis window (maxm), mean of focal depth in km (mean_dep), and median of focal depth in km (median_dep) are provided. It is noted that the event number window was set from the latest data back to the past and no analysis has been carried out in windows where \(N\) earthquakes whose magnitude is larger than \({M}_{\mathrm{th}}\) have not occurred since 2000. The slight lack of \(D\)-values is due to slight difference in the window where each index value has been estimated, and does not affect any results or discussions in this paper.

Additional file 3.

Table of estimated index values in area A for \(l=0.2^\circ\) and \(N=100\). Same as Additional file 2, but for \(l=0.2^\circ\) and \(N=100\).

Additional file 4.

Table of estimated index values in area A for \(l=0.4^\circ\) and \(N=50\). Same as Additional file 2, but for \(l=0.4^\circ\) and \(N=50\).

Additional file 5.

Table of estimated index values in area A for \(l=0.4^\circ\) and \(N=100\). Same as Additional file 2, but for \(l=0.4^\circ\) and \(N=100\).

Additional file 6.

Table of estimated index values in area B for \(l=0.2^\circ\) and \(N=50\). Same as Additional file 2, but for area B.

Additional file 7.

Table of estimated index values in area B for \(l=0.2^\circ\) and \(N=100\). Same as Additional file 2, but for area B, \(l=0.2^\circ\) and \(N=100\).

Additional file 8.

Table of estimated index values in area B for \(l=0.4^\circ\) and \(N=50\). Same as Additional file 2, but for area B, \(l=0.4^\circ\) and \(N=50\).

Additional file 9.

Table of estimated index values in area B for \(l=0.4^\circ\) and \(N=100\). Same as Additional file 2, but for area B, \(l=0.4^\circ\) and \(N=100\).



Kolmogorov–Smirnov test

The null hypothesis in the two-sample KS test (e.g., Hodges 1958) is that the probability density functions of the populations of two sets of samples are equal; the test uses the statistical test quantity \({D}_{{n}_{1},{n}_{2}}={\mathrm{sup}}_{x}\left|{F}_{{n}_{1}}\left(x\right)-{G}_{{n}_{2}}\left(x\right)\right|\). Here, \({F}_{{n}_{1}}\left(x\right)\) and \({G}_{{n}_{2}}\left(x\right)\) are the empirical cumulative distribution functions of each sample, representing the frequency with which each sample takes a value less than or equal to \(x\), and \({\mathrm{sup}}_{x}\) is the supremum function of \(x\). Under the null hypothesis, \({D}_{{n}_{1},{n}_{2}}\) has a distribution that depends on the size of each sample \({n}_{1},{n}_{2}\), but is independent of the distribution functions of the samples. In the test, the probability of obtaining \({D}_{{n}_{1},{n}_{2}}\) under the null hypothesis is calculated as \({p}_{\mathrm{KS}}\). This calculation was done by estimating \({D}_{{n}_{1},{n}_{2}}\) from the cumulative probability distribution of each sample and using that value and the values \({n}_{1}\) and \({n}_{2}\) by direct computation following Hodges (1958).

Brunner-Munzel test

The null hypothesis in the BM test (Brunner and Munzel 2000) is that the probability that a value taken from one of the two sample sets is larger than a value taken from the other is 0.5. As the statistical test quantity, we use \({W}_{{n}_{1}+{n}_{2}}^{\mathrm{BF}}=\frac{1}{\sqrt{{n}_{1}+{n}_{2}}}\cdot \frac{{\overline{R} }_{2}-{\overline{R} }_{1}}{{\widehat{\sigma }}_{{n}_{1}+{n}_{2}}}\). Here, \({\overline{R} }_{j}={n}_{j}^{-1}\sum_{k=1}^{{n}_{j}}{R}_{jk}\) (\(j=\mathrm{1,2}\)) is estimated as the average of the ranks of each sample over the two sample sets \({R}_{jk}\). Also, \({\widehat{\sigma }}_{{n}_{1}+{n}_{2}}=N\cdot \left[{\widehat{\sigma }}_{1}^{2}/{n}_{1}+{\widehat{\sigma }}_{2}^{2}/{n}_{2}\right]\), where \({\widehat{\sigma }}_{j}^{2}={S}_{j}^{2}/{\left(N-{n}_{j}\right)}^{2}\) is estimated from the variance \({S}_{j}\) of the difference between \({R}_{jk}\) and the rank in each sample \({R}_{jk}^{(i)}\).

When the sizes of samples \({n}_{1}\) and \({n}_{2}\) are both large (~ 50), the distribution of \({W}_{{n}_{1}+{n}_{2}}^{\mathrm{BF}}\) under the null hypothesis asymptotically approaches the standard normal distribution, which is not accurate for small samples. Brunner and Munzel (2000) proposed that for small samples (\({n}_{1},{n}_{2}\ge 10\)), a better approximation is to test \({W}_{{n}_{1}+{n}_{2}}^{\mathrm{BF}}\) with a \(t\)-distribution, \({t}_{\widehat{f}}\), with the following degrees of freedom (\(\widehat{f}):\)

$$\begin{array}{c}\widehat{f}=\frac{{\left(\sum_{j=1}^{2}{\widehat{\sigma }}_{j}^{2}/{n}_{j}\right)}^{2}}{\sum_{j=1}^{2}{\left({\widehat{\sigma }}_{j}^{2}/{n}_{j}\right)}^{2}/\left({n}_{j}-1\right)} .\end{array}$$

This approximation is asymptotically correct because the \({t}_{\widehat{f}}\) distribution converges to the standard normal distribution for large \(\widehat{f}\). In the present analysis, we used a test based on the \({t}_{\widehat{f}}\) distribution because, in most cases, one of the sample sizes (the number of index values estimated in a single rectangular region) is small. However, if either sample size is very small (e.g., \({n}_{j}<10\)), simple and accurate approximations in a general nonparametric model cannot be expected (Brunner and Munzel 2000). In such cases, the application of the permutation test to \({W}_{{n}_{1}+{n}_{2}}^{BF}\) has been reported to be effective (Neubert and Brunner 2007). In the present analysis, because the number of index values obtained outside of one target spatial region is large and computing all combinations would incur huge computational costs, the test was conducted using 300 combinations by the bootstrap method. The probability of obtaining the observed two pairs of samples under the null hypothesis is defined as \({p}_{\mathrm{BM}}\).

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nagata, K., Tamaribuchi, K., Hirose, F. et al. Statistical study on the regional characteristics of seismic activity in and around Japan: frequency-magnitude distribution and tidal correlation. Earth Planets Space 74, 179 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: