 Full paper
 Open access
 Published:
Realtime earthquake magnitude estimation via a deep learning network based on waveform and text mixed modal
Earth, Planets and Space volumeÂ 76, ArticleÂ number:Â 58 (2024)
Abstract
Rapid and accurate earthquake magnitude estimations are essential for earthquake early warning (EEW) systems. The distance information between the seismometers and the earthquake hypocenter can be important to the magnitude estimation. We designed a deeplearning, multipleseismometerbased magnitude estimation method using three heterogeneous multimodalities: threecomponent acceleration seismograms, differential Parrivals, and differential seismometer locations, with a specific transformer architecture to introduce the implicit distance information. Using a dataaugmentation strategy, we trained and selected the model using 5365 and 728 earthquakes. To evaluate the magnitude estimation performance, we use the root mean square error (RMSE), mean absolute error (MAE), and standard deviation error (Ï) between the catalog and the predicted magnitude using the 2051 earthquakes. The model could achieve RMSE, MAE, and Ï less than 0.38, 0.29, and 0.38 when the passing time of the earliest Parrival is 3Â s and stabilize to the final values of 0.20, 0.15, and 0.20 after 14Â s. The comparison between the proposed model and model ii, which is retrained without the specific architecture, indicates that the architecture contributes to the magnitude estimation. The Parrivals picking error testing indicates the model could provide robust magnitude estimation on EEW with an absolute error of less than 0.2Â s.
Graphical Abstract
Introduction
Rapid and accurate earthquake magnitude estimation during quaking is essential for earthquake early warning (EEW) systems, especially in providing quick information about the earthquake to the public or specific users (Yamada et al. 2021). The earthquake magnitude measures the size of the earthquake empirically using the peak amplitude after the distance is corrected carefully (Funasaki & Earthquake Prediction Information Division 2004; Katsumata 2004; Moriwaki 2017; Richter 1935; Tsuboi 1954), such as the Japan Meteorological Agency Magnitude (M_{JMA}) is defined based on multiple magnitudes (the local meteorological office magnitude, displacement magnitude, and velocity magnitude). The classical methods applied to EEW systems would be roughly classified into the predominant period (Kanamori 2005; Heidari 2018), peak amplitude (Wu & Zhao 2006; Kuyuk and Allen 2013; Colombelli et al. 2020), and energy (Festa et al. 2008), which still play a vital and foundational role in EEW. However, the accuracy of the magnitude estimation is associated with the determination of the hypocenter location which the challenges still exist to pinpoint the hypocenter location in realtime primarily due to limited information in the early stage of the earthquake (Saad et al. 2022b) not yet considering the peak integrated amplitude influenced by the lowfrequency noise (Yamada and Mori 2009). There are two representative strategies to extract information from the waveforms for magnitude estimation: the classical methods (Colombelli et al. 2020; Kanamori 2005; Kuyuk and Allen 2013; Wu and Zhao 2006) and deeplearning methods (Kuang et al. 2021; Mousavi & Beroza 2020a; MÃ¼nchmeyer et al. 2021; Saad et al. 2022a). The classical methods could be considered the seismologist expertisebased physics features extraction strategy, and the deeplearning methods could be considered the automatic and direct features extraction strategy assisted by the seismologist expertise. Some seismologists are trying to utilize deeplearning methods to obtain low error levels of magnitude estimation performance and make some progress, especially in the early stage of the earthquake (MÃ¼nchmeyer et al. 2021).
In recent years, researchers have adopted deeplearning approaches to solve problems in various seismic fields from waveforms, such as earthquake detection (Meier et al. 2019; Perol et al. 2018; Reynen and Audet 2017), phase picking (Mousavi et al. 2020; Zhu and Beroza 2019), estimation of earthquake location (Lomax et al. 2019; Mousavi and Beroza 2020b; MÃ¼nchmeyer et al. 2021), and magnitude estimation (Kuang et al. 2021; Lomax et al. 2019; Mousavi & Beroza 2020a; MÃ¼nchmeyer et al. 2021; Saad et al. 2022a). The architectures of these magnitude estimation algorithms are mainly the convolutional neural networks (CNN), the longshort memory networks (LSTM), and the transformer networks. The CNN could extract significant features due to its architecture with a parametersharing design. However, it requires scale/normalization input or other neural networks to avoid the domination of features with higher amplitudes on the prediction results (Saad et al. 2022a). As a result of the importance of the waveformâ€™s amplitude, it is challenging to use only CNN on magnitude estimation. Compared with CNN, the LSTM is insensitive to the nonnormalization waveforms due to the gate mechanism with the Tanh and Sigmoid activation function (Hochreiter and Schmidhuber 1997; Mousavi et al. 2019). It is suitable to combine with CNN on magnitude estimation, such as the algorithms (Lomax et al. 2019; Mousavi and Beroza 2020a). Although the methods are not suitable for realtime in the current structure, their design concepts are still worth learning to design a realtime deeplearning magnitude estimation model; that is, CNN extracts significant features, and LSTM avoids the domination of features with higher amplitudes on the prediction results. The LSTM is specially designed for time series, which could process unfixed time using the different lengths of waveforms recorded on different seismometers due to the wave propagation and seismometer distribution during quaking. The transformer networks (Vaswani et al. 2017) could weigh the features according to their relationship using the attention mechanism, which could be suitable for processing the features from multiple seismometers with different lengths of waveforms in realtime. The typical method (MÃ¼nchmeyer et al. 2021) utilized the CNN to extract the onsite features and combine them with six transformer encoders. To our knowledge, it belongs to the earliest use of transformer encoder for multiple seismometers. In addition, they proposed a set of practices to build a model for fast earthquake source characterization, which is significant to help more seismologists establish a suitable deeplearning model in their fields. Moreover, the transformer networks for extracting features corresponding to different times using a single waveform for magnitude estimation have made the process, and the seismologists are developing it to suit realtime situations (Saad et al. 2022a). Several realtime methods currently use CNN (such as Van Den Ende and Ampuero 2020), seemingly introducing largely noise or invalid zeros, which may still be debated (Saad et al. 2022a).
Most magnitude estimation methods are not realtime or network approaches. These models provide evidence that distance information is essential for magnitude estimation, directly introducing distance or automatically extracting from P/S phases in waveform (Kuang et al. 2021; Mousavi and Beroza 2020a). For the magnitude estimation approaches in EEW, it is ideal to introduce the location information based on the final earthquake location. Unfortunately, the estimation of the earthquake location might vary with the increase in earthquake information during quaking. A recent study provides a random forestbased approach to accurately estimate the earthquake location using differential Pwave arrivals and seismometer locations recorded on the five earliest seismometers (Saad et al. 2022b). The method inspires us to introduce location information on magnitude estimation for processing the variety of earthquake location estimations at the early stage of an earthquake. However, making direct model fusion on the above random forest model and a deeplearning magnitude estimation model using waveforms is challenging. On the other hand, it should be carefully considered to utilize the two types of heterogeneous multimodality data of an earthquake on magnitude estimation based on the deeplearning methods: the time series sequence data of the waveforms and the text data of the Pwave arrivals and seismometer locations, which the two type multimodalities might be related to different physics meanings. Recently, multimodal machine learning (MMML) could construct models that can process and relate information from multiple heterogeneous modalities (BaltruÅ¡aitis et al. 2018; Lahat et al. 2015), having been paying attention to a variant of multidisciplinary fields, e.g., computer vision (Luo et al. 2022; Radford et al. 2021; Wang et al. 2021a, b), natural language processing (Gong et al. 2021; Liu et al. 2021). Some research indicates that transformer architectures perform well on multimodality fusion tasks of MMML (Gong et al. 2021; Nagrani et al. 2021; Zhu et al. 2020) and provide a way to process multimodality data of an earthquake. Although the realtime magnitude estimation methods (MÃ¼nchmeyer et al. 2021) could generally extract distance information from the waveforms recorded on each seismometer, it may not always work at the initial of the earthquake, especially when the seismometers are far from the earthquake location, which leads to the larger travel time between the P/S phases. Thus, to avoid the potential situation, we expect to introduce the additional location information by the multiple triggered seismometers. Considering that the scale/normalization influence could decrease by incorporating the maximum amplitude as additional input still be debated (Kuang et al. 2021; Lomax et al. 2019; MÃ¼nchmeyer et al. 2021; Mousavi and Beroza 2020a) when only using CNN, we adapt the CNNâ€“LSTM model as the onsite extraction as the models (Lomax et al. 2019; Mousavi and Beroza 2020a). In this paper, we propose deeplearningbased methods for realtime magnitude estimation using three heterogeneous multimodalities: multiple seismometer waveforms, differential Pwave arrivals, and differential seismometer locations, with a specific architecture to introduce the distance information by automatically processing the variety of hypocenter location estimation. We evaluate the magnitude estimation performance of our model and provide evidence for the effectiveness of the specific architecture using the root mean square error (RMSE), mean absolute error (MAE), and standard deviation error (Ï).
Methods
Inputs
The input to the model consists of three heterogeneous multimodalities, including the threecomponent acceleration seismograms from multiple seismometers, differential Pwave arrivals (T), and differential seismometer locations (L). We construct the acceleration seismograms based on the Pwave arrivals as the following process: We set \({t}_{1}\) + \({t}_{noise}\) and \({t}_{i}\) + \({t}_{noise}\) as the lengths of the acceleration waveforms recorded on the earliest and later Pwave arrival seismometers. \({t}_{i}\) is the waveform length after Pwave arrival recorded on the \({i}^{th}\) seismometer and \({t}_{noise}\) is the noise length. To simulate the real situation, \({t}_{i}\) should be \({t}_{1}\)â€“\({\Delta t}_{i}\) which \({\Delta t}_{i}\) is the travel time between the \({i}^{th}\) Pwave arrival seismometer and the earliest Pwave arrival seismometer (iâ€‰=â€‰1, 2, â€¦). We clipped the waveforms after the Pwave arrival recorded on the \({i}^{th}\) seismometer when \({t}_{i}\) is greater or equal to 1.0Â s, reference \({t}_{1}\) from 1 to 30Â s with an interval of 1Â s. To make the model learn the noise condition, we set the length of the noise (\({t}_{noise}\)) as 1Â s before the Pwave arrival at each seismometer. Based on the previous study (Mousavi & Beroza 2020a) about the effectiveness of the amplitude on magnitude estimation, we do not adopt any normalization process. We assumed the Pwave arrival could be identified when the length of the Pwave is greater or equal to 1Â s. For the other inputs of the differential Pwave arrivals and differential seismometer locations could be expressed as
where \({\Delta lat}_{i}\) represents the numerical latitude difference between the \({i}^{th}\) and the earliest Parrival seismometer, \({\Delta lon}_{i}\) represents the numerical longitude difference between the \({i}^{th}\) and the earliest Parrival seismometer. Considering the computational cost, we set the maximum number of seismometers to 20. In addition, to simulate the realtime condition, the \({i}^{th}\) Parrival seismometer should be triggered at the same time.
Model
We build the realtime earthquake magnitude estimation model based on the CNN, LSTM, and transformer architecture. The proposed model consists of four parts: single waveform feature extraction, implicit distance information extraction, feature fusion, and magnitude output. We adopt CNN and LSTM layers to extract each threecomponent acceleration waveform. We mainly utilize the CNN layer without any activation unit to downsample the threecomponent dimensions of the waveform. The kernel size and step size are 3Ã—1. To the \({i}^{th}\) Parrival seismometer with waveform dimension 3Ã—100Ã— (\({t}_{i}\)+1) (the sampling frequency is 100Â Hz), the dimension tunes to 100Ã— (\({t}_{i}\)+1) by the CNN layer which keeps the time series of the waveform. Then, we adopt the LSTM with 32 units to extract the features in advance, in which the activation and recurrent activation are Tanh and Sigmoid activation functions (Hochreiter and Schmidhuber 1997), respectively. The LSTM could have the same weights on the time series, which could be more suitable for processing the unfixed time length of the waveforms. We set the input time window to the LSTM as 0.5Â s (50 points) and slip the time window with an interval of 0.5Â s. We selected the final time window output by the LSTM units as the single waveform features. To avoid possible information leakage, we selected the penultimate time window output by the LSTM as the final single waveform features when the length of the last input time window is less than 0.5Â s. The advantage of the LSTM design is that it could make the dimension of the features the same as the features extracted from each threecomponent acceleration waveform with unfixed time length.
For the implicit distance information extraction architecture, we mainly utilized the transformer encoder and decoder architectures to extract and fuse features. We tune the dimension and extract features of the differential Pwave arrivals or the differential seismometer locations by a fully connected layer (FC) without any activation units, respectively. The input dimension is iâ€‰Ã—â€‰1 (or iâ€‰Ã—â€‰2) and the output dimension is iâ€‰Ã—â€‰32. The dimension process could make the following calculation easily by the transformer encoder and decoder architectures. We first extract the time feature vectors by the transformer encoder with the input of the differential Parrivals after the FC layer. The time feature vectors might indicate the travel time relationships between the seismometers and the location of an earthquake. Then, we obtain the location feature vectors using the transformer decoder by fusing the input of the differential seismometer locations after the FC layer (as the query vector) and the time feature vectors (as the value and key vectors). The location feature vectors might indicate the distance between the seismometers and the location of an earthquake. The transformer encoder contains a selfattention layer, a layer normalization layer, two FC layers, and one dropout rate layer, as shown in Fig.Â 1. The transformer decoder has one attention layer, three layer normalization layers, two FC layers, and one dropout rate layer (Fig.Â 1). The two FC layers contain 64 and 32 neurons, respectively, the first FC layer followed by a Relu (rectified linear unit) activation (Nair & Hinton 2010) unit and a dropout layer (Srivastava et al. 2014). The above architecture differs from the traditional architecture (Vaswani et al. 2017). We adopt the above architecture as the previous study (Xiong et al. 2020), which could be easily trained without a warmup stage.
We fuse the features from multiple seismometers through a transformer encoder and a global average pooling (GAP) layer. We first add the location feature vectors on the waveform features from multiple seismometers. Then, we adopt the transformer encoder on the adding features to obtain the weight features. We use a global average pooling (GAP) layer to downsample the weight features. The GAP could be flexible to make the model process multiple seismometer features. Then, we use an FC layer with one neuron to estimate the magnitude.
Results
To ensure the data quality, we selected the earthquakes recorded on the KNET and KiKnet between January 2008 and June 2020 as the following criteria (National Research Institute for Earth Science and Disaster Resilience, 2019). (1) The Pwave arrivals could be identified; (2) epicenter distances are less than 2 degrees; (3) magnitudes of the earthquakes are greater or equal to M_{JMA} 3.0; (4) depths of the earthquakes are less than 300Â km; (5) acceleration waveforms in three components are recorded on the seismometers equipped at the ground surface or upbore (Okada et al. 2004); (6) signaltonoise ratios (SNR) are greater than 10 which the SNR is defined based on the previous study (Wang et al. 2021a, b); and (7) seismometers are greater than 5. We selected 297,099 threecomponent waveforms from 8144 earthquakes in Japanâ€™s inland and offshore areas. We obtained the above acceleration waveforms from the online data sets. The information concerning the event location, seismometer location, and magnitude are provided by the JMA (Japan Meteorological Agency). After picking the Parrivals manually and removing the baseline offset by subtracting the mean before Parrival, we filtered these waveforms with corner frequencies of 0.075Â Hz using a highpass digital infinite impulse response (IIR) filter. Then, we extracted the earthquakes after the origin time of 11 November 2016 as the testing data set, which is used to evaluate the model. We randomly split the earlier earthquakes into the training and validation data sets. The training and validation data sets are used to train and select the model. The training, validation, and testing data sets consist of 5,365, 728, and 2,051 earthquakes, with magnitude ranges from 3.0 to 9.0, 3.0 to 7.3, and 3.0 to 7.4, as shown in Fig.Â 2b. As there are only two earthquakes whose magnitudes exceed 7.0 on the testing data set, we add the earthquake with a magnitude of 7.4, and the origin time is 2022/03/16 23:36. For each data set, most earthquakesâ€™ magnitudes and depths range from 3.0 to 5.0 (Fig.Â 2b) and 0 to 100Â km (Fig.Â 2c). FigureÂ 2d shows the distribution between the magnitudes and epicenter distances (âˆ†) of these earthquakesâ€™ waveforms in three data sets, and the testing data set has a few points withâ€‰âˆ†â‰¤â€‰50Â km when the magnitude isâ€‰â‰¥â€‰5.0. Most epicenter distances are distributed from 20 to 100Â km in each data set (Fig.Â 2e). We set the catalog magnitude provided by the Japan Meteorological Agency as a label, and we expected the model to estimate the catalog magnitude, which could be considered as JMA magnitude. We trained the model for 150 epochs using the mean square error (MSE) loss function with an Adam optimizer (the initial and final learning rate is 0.001 and 0.0001) and randomly selected the 129th model when validation loss was stable during training processing. To avoid overfitting, we adopted a data augmentation strategy and randomly selected 20 seismometers per earthquake during training, if possible. During the model evaluation, we evaluated the model using the 20 earliest Parrival seismometers per earthquake. We achieved the data augmentation strategy by resampling the earthquakes with the unfixed ratios to make the sample number of different magnitude bins approximately the same. The ranges of the magnitude bins are 3.0â€‰~â€‰3.5, 3.5â€‰~â€‰4.0, 4.0â€‰~â€‰4.5, 4.5â€‰~â€‰5.0, 5.0â€‰~â€‰5.5, 5.5â€‰~â€‰6.0, 6.0â€‰~â€‰6.5, 6.5â€‰~â€‰, respectively.
We evaluated the proposed model by the root mean square error (RMSE), mean absolute error (MAE), and standard deviation error (Ï) between the catalog and predicted magnitude with time (\({t}_{1}\)) increases from 1 to 30Â s after the first Pwave arrival per earthquake (Fig.Â 3). The MAE could measure the average absolute difference between the statistical modelâ€™s predicted and catalog magnitude. Compared with the MAE, the RMSE is more sensitive to the predicted magnitudes with an absolute errorâ€‰>â€‰1.0 (Chai and Draxler 2014). For the EEW application, the prediction error may not be expected when the absolute value exceeds 1.0, especially when the error exceeds 1.0. We considered the Ï as a supplement metric that measures the stability of the prediction errors by the model. To the three evaluation metrics, the less values mean the model has lower error levels and is more stable on magnitude estimation. The error curves of the three evaluation metrics could be considered as the prior knowledge to guide the EEW systems to alert the public or specific users with the different tolerations. To show the magnitude performance of the proposed model, we chose the classical model (Kuyuk and Allen 2013) as a baseline comparison and the realtime deeplearning method from MÃ¼nchmeyer et al. (2021) as a CNNbased comparison. The classical model utilized the linear relationships between the magnitude, epicenter distance, and the vertical peak displacement of Pwave (\({P}_{d}\)) for the magnitude estimation, which its basis is that \({P}_{d}\) should be proportional to the rate of moment release (Aki and Richards 2002) in the far field (Trugman et al. 2019). To avoid the inclusion of the Swave, we adopted the theoretical Sarrivals based on the previous study (Colombelli et al. 2014). Assuming the epicenter location as a known parameter for the classical model, we directly averaged the magnitudes calculated from the triggered seismometers, using the coefficients in the previous study (Kuyuk and Allen 2013). The realtime deeplearning model (MÃ¼nchmeyer et al. 2021) utilized the normalized time series and logarithmic peak absolute value of the acceleration waveforms and geographical location (latitude and longitude) from multiple seismometers to estimate the magnitude Gaussian mixture, which contains the onsite feature extraction part (the CNNs and the multilayer perceptron) and the multiple seismometersâ€™ feature combination part (the six transformer architectures and the trained weights). The time window of the input corresponding to the onsite time series is fixed to 30Â s, and pad zeros to 30Â s if the time series length is less than 30Â s. Considering the Gaussian mixture, it could not be assumed that the predicted uncertainties are indeed well calibrated, as they mentioned (MÃ¼nchmeyer et al. 2021); we made the model output the magnitude by the minor change using an FC layer instead of the multilayer perceptron used to predicted mixture Gaussian. The minor change cannot influence the comparison with our proposed model. Then, we retrained the deeplearning model using the same data sets and loss function as the proposed model, according to the technical details MÃ¼nchmeyer et al. mentioned (2021), mainly the pretraining model trick.
For the proposed model, the RMSE, MAE, and Ï are 0.38, 0.29, and 0.38 at 3Â s and are stable with the final values of 0.20, 0.15, and 0.20 after 14Â s, respectively. The RMSE, MAE, and Ï of the classical model are 0.44, 0.34, and 0.44 at 3Â s and stabilize to 0.35, 0.26, and 0.32 after 10Â s, respectively. To the CNNbased model, the RMSE, MAE, and Ï are 0.49, 0.39, and 0.44 at 3Â s and stabilize to 0.28, 0.21, and 0.27 after 15Â s. Compared with the baseline and CNNbased models, the errors of our proposed magnitude estimation model decrease more rapidly and achieve a smaller final value. The proposed model could achieve the final value of the baseline and the CNNbased models when the time (\({t}_{1}\)) is approximate 4 and 8Â s, as shown in Fig.Â 3aâ€“c. We further analyzed the three earthquakes from the testing data set to evaluate the performance of the M7 earthquakes with the catalog magnitudes exceeding 7.0, which the EEW is more concerned with. The M7 RMSE of the proposed model is 0.94 and less than the 1.09 and 1.61 corresponding to the classical and CNNbased models when the time (\({t}_{1}\)) is 3Â s. Compared with the two models, the M7 RMSEs of the proposed model decrease more rapidly with a lower final value of 0.29, which means the proposed model has lower error levels on the three earthquakes with a magnitudeâ€‰â‰¥â€‰7.0. Moreover, each modelâ€™s M7 MAE and Ï errors at different times show the same trend as the M7 RMSEs (Fig.Â 3). For each model, the M7 RMSE, MAE, and Ï are larger than the errors of the whole magnitude bins. In addition to considering the smaller quantity of M7 may influence the results, the larger error levels of M7 may also be related to the rupture progress of earthquakes with the magnitudeâ€‰â‰¥â€‰7.0, which is more complex and has a longer duration (Trugman et al. 2019). The proposed model has less errors (lower error levels) on magnitude estimation using the testing data set, which the following reasons may cause: (1) the proposed model extracts more information from the threecomponent waveforms, the differential Parrivals, and the differential seismometer locations than just utilizing the peak displacement of Pwave; (2) the proposed model is trained on simulating a complex situation by random selecting the triggered seismometers which lead the model could be more suitable to the realtime situation without random seismometers selected; and (3) the difference between the data sets in the study and the previous study (MÃ¼nchmeyer et al. 2021) may lead the CNNbased model having larger errors than their results. Except for containing the different earthquakes, the data sets of this study contain the waveforms recorded on the seismometers equipped at the ground surface from the KNET and uphole from the KiKnet, without the waveforms recorded on seismometers equipped at the downbore from the KiKnet. However, as the earliest deeplearningbased magnitude estimation model (MÃ¼nchmeyer et al. 2021) using multiple seismometers in real time, the concepts still have great value, such as combining the multiple seismometer features by utilizing the transformer encoder architectures, the positional embedding (seismometer location), the event token, and pretraining model.
FigureÂ 4 indicates the predicted results by the proposed model when time (\({t}_{1}\)) is at 1, 2, 3, 4, 5, 6, 7, and 8Â s after the earliest Pwave arrival. The figure could show the magnitude estimation performance corresponding to errors in detail. Overall, with time (\({t}_{1}\)) increases, the top subfigures from a to f directly indicate that the points distribute more convergence around the 1:1 relationship line as the errors decrease. Similarly, as the bottom subfigures show (the blue color bars), the better performance trend is also reflected in the increased numbers of earthquakes in which the prediction errors are less than 0.5. When \({t}_{1}\) is equal to 3Â s, the results of the proposed model indicate that most of the prediction errors are withinâ€‰Â±â€‰0.5 range. However, for the earthquakes of magnitudesâ€‰â‰¥â€‰6.5, the model still underestimates magnitude with an absolute prediction error exceeding 1.0 when \({t}_{1}\) is 3Â s and the underestimation is mitigated with t_{1} increases, as the top subfigures show. Based on the physical model from Trugman et al. (2019), the physical model suggests a weak rupture predictability based on the peak displacement after 50% of the rupture duration. MÃ¼nchmeyer et al. (2021) analyzed the magnitude saturation based on the physical model; that is, the value of the saturation magnitude could be expected to be 5.7 after 1Â s, 6.4 after 2Â s, 7.0 after 4Â s, and 7.4 after 8Â s. Considering the complexity of rupture progress, triggered seismometer distribution, and data distribution of the training data set in realtime situations, the model could only achieve the threshold somewhat. Moreover, we observed that there are several points with the errorâ€‰>â€‰1 when \({t}_{1}\) is 1 or 2Â s. The more significant error points occurred inland with only 1 or 2 triggered seismometers in which the epicenter distance is less than 10Â km and the earthquakeâ€™s depth is less than 24Â km, as shown in Fig.Â 4a (\({t}_{1}\)=1Â s), b (\({t}_{1}=\) 2Â s). To the datadriven model, the training data set lacks seismometers with an epicenter distance of less than 10Â km, as shown in Fig.Â 2e, which may be one reason for the overestimated magnitude only using 1 or 2 seismometers with theâ€‰â€‰âˆ†<10Â km. Moreover, we speculated that diversity may be a factor to consider. In this current situation of the distribution of the triggered seismometers, the model also could not contribute to the magnitude estimation by introducing the implicit location information between the seismometers and earthquake source from the Parrivals and seismometer locations.
To provide the effectiveness of the encoderâ€“decoder architecture with different Pwave arrivals and seismometer locations, we only removed the above architecture of the proposed model and retrained, called model ii. The RMSE, MAE, and Ï of the model ii achieve 0.46, 0.36, and 0.45 at 3Â s and stabilize to approximately 0.21, 0.16, and 0.20 after 20Â s, respectively. Based on the curves of the errors (Fig.Â 3aâ€“c), the proposed model has less errors consistently and reaches the final value of model ii with less time. For earthquakes with a magnitudeâ€‰â‰¥â€‰7.0, the curves of the proposed model also perform less errors than those of the model ii. In other words, the results indicate that the encoderâ€“decoder architectures contribute to the magnitude estimation, especially at the initial earthquake. The differential errors between the proposed model and model ii decrease with time (\({t}_{1}\)) increase, indicating that the contribution of the encoderâ€“decoder architectures decreases with time (\({t}_{1}\)) increases. Their same single waveform feature extraction process may cause a decreased trend, extracting the information related to distance from longer waveforms, which contain more information, just like the difference between Parrival and Sarrival could be used to estimate the distance. Based on the previous study (Saad et al. 2022b), the Parrivals and seismometer location could be used to ensure the earthquake location and be more accurate with more triggered seismometers. Thus, we expected the encoderâ€“decoder architecture to obtain more accurate implicit earthquake location information with more triggered seismometers, which bring more accurate magnitude estimation performance. FigureÂ 5 indicates the error curves with time (\({t}_{1}\)) increase when the triggered seismometers exceed 1, 3, 5, 10, and 15. Taking the MAEs as an example, when the time (\({t}_{1}\)) is 2Â s, the MAEs are 0.36, 0.30, 0.24, 0.18, and 0.19, respectively. Similarly, the other two errors also decreased with the increased number of triggered seismometers. The prediction error curve for each earthquake at different triggered seismometer conditions indicates that the error decreased trend with the triggered seismometers increased, as shown in Fig.Â 5d. There are almost no earthquakes with an absolute errorâ€‰â‰¥â€‰1.0 when the triggered seismometersâ€‰â‰¥â€‰3. The error analysis indicates that the proposed model obtains better magnitude estimation performance with more triggered numbers and could provide evidence that encoderâ€“decoder architecture obtains implicit location information.
Discussion
Hyperparameter optimization of the model
To obtain the optimal network architecture of the proposed model, we tuned the network architecture and selected the final architecture based on the MSE using the validation data set. Based on our design criteria in the Method Section, the LSTM units are a crucial and fundamental parameter that influences all the layers. In other words, we could change the units to tune the network architecture. First, we set the several numbers of the LSTM units as 2, 4, 8, 16, 32, and 64. We found that when the units exceed 8, the proposed model begins to have the ability to estimate the magnitude. Since the LSTM units are greater than 32, the MSE of the proposed model is similar. Thus, we chose the LSTM units as 32. Second, we added a dropout layer between the two FC layers in each transformer encoder or the decoder to avoid overfitting. We tested the dropout rate as 0.1, 0.2, 0.3, 0.5, and 0.7. We found that the magnitude estimation performance is not significantly decreased when the dropout rate is less or equal to 0.5. Considering the advice value in the previous study (Srivastava et al. 2014) and our testing results, we finally chose 0.3 as the dropout rate. We mainly utilized the attention mechanism to obtain the location information and weigh the onsite features. We did not test the variation of the magnitude estimation performance with the number of the transformer decoder or encoder increases. While tuning the LSTM units and dropout rate, the transformer decoder or encoder number with different inputs is 1. The tuning results seemly indicate there is no need to tune the number of the transformer decoder or encoder. Thus, we set 1 as the transformer decoder or encoder number with different inputs.
Magnitude estimation performance using only Pwave
The proposed model estimates the magnitude using the unfixed length waveforms from the triggered seismometers, the waveform contains only Pwave or both P/Swave in realtime situations. However, for the proposed model, whether the Pwave contributes to the magnitude estimation and the error level on magnitude estimation only using the Pwave information might be unclear. On the other hand, it might be challenging to analyze the contribution directly from the Pwave based on the proposed model using the LSTM architecture. To indicate the Pwave contribution to the magnitude estimation of the proposed model, we retrained the proposed model and model ii only using Pwave based on theoretical Sarrivals (Colombelli et al. 2014), called modelP and model iiP. Then, we utilized the same data sets to train and select the models (only using Pwave) using the same criteria as the other models above. Considering the Parrival times could contribute to the magnitude estimation when the triggered seismometers are greater than 3 (as shown in Fig.Â 5), we evaluated the models using the earthquakes from the testing data set when the triggered seismometers exceed 3. As shown in Fig.Â 6aâ€“c, the RMSE, MAE, and Ï of the proposed modelP are 0.41, 0.31, and 0.40 at 3Â s and stabilize with the final values of 0.32, 0.24, and 0.31 after 16Â s, respectively. To evaluate the performance of the M5.5 earthquakes with catalog magnitude exceeding 5.5, we further analyzed the 76 earthquakes from the testing data set. For the M5.5 earthquakes, the RMSE, MAE, and Ï of the proposed modelP are 0.87, 0.69, and 0.71 at 3Â s and stabilize with the final values of 0.68, 0.51, and 0.64 after 15Â s, respectively. To the model iiP, the RMSE, MAE, and Ï of the model iiP are 0.49, 0.37, and 0.48 at 3Â s and stabilize with the final values of 0.32, 0.24, and 0.32 after 14Â s, respectively. For the M5.5 earthquakes, the RMSE, MAE, and Ï of the model iiP are 0.94, 0.74, and 0.71 at 3Â s and stabilize with the final values of 0.71, 0.54, and 0.58 after 15Â s, respectively. The results of the modelP and model iiP indicate that the two models only using the Pwave information could be used for magnitude estimation, but the error level is high. In addition, the comparison between the two models only using Pwave and the proposed model suggests that introducing the Swave or Surfacewave could have lower error levels on the performance of magnitude estimation, especially for the magnitude earthquakes M5.5.
Effect on magnitude estimation from the uncertainty of the Pwave arrival picks
The proposed model for magnitude estimation relies on the Parrivals, in which the Parrivals influence the utilization of the length of the acceleration waveforms and Pwave travel times from multiple seismometers. The accuracy of the Parrival picks should influence the proposed modelâ€™s magnitude estimation performance. To obtain the effect on the magnitude estimation from the uncertainty of the Parrival picks, we calculated the magnitudes 50 times by simulating erroneous Parrival picks on the testing data set. We simulated erroneous Parrival picks by adding random perturbations on Parrival times from different absolute picking error bins. The ten absolute picks error bins are 0.0â€‰~â€‰0.1, 0.1â€‰~â€‰0.2, â€¦, 0.8â€‰~â€‰0.9, and 0.9â€‰~â€‰1.0Â s; for example, the absolute picks error bin is 0.2â€‰~â€‰0.3Â s; we added the random perturbations fromâ€“0.2â€‰~â€“0.3 or 0.2â€‰~â€‰0.3Â s. FigureÂ 7 shows the modelâ€™s simulation results for different absolute picking error bins on the testing data set. Overall, the RMSE, MAE, and Ï curves have greater values in a higher absolute picking error bin; the phenomenon means the higher absolute picking error brings a higher error level on magnitude estimation by the proposed model (shown in Fig.Â 7dâ€“f). The comparison with the RMSE, MAE, and Ï curves from different absolute picking error bins shows that the 0.0â€‰~â€‰0.1Â s absolute picking error bin curves coincide with the manual picks curves, and the curves of the 0.1â€‰~â€‰0.2Â s absolute picking error bin are slightly higher than those of the manual picks. However, the curves of the 0.2â€‰~â€‰1.0Â s absolute picking error bin are significantly higher than those of the manual picks, which means a more significant error level on magnitude estimation and the proposed model may not obtain a reliable estimation when the absolute picking error exceeds 0.2Â s. To quantitatively calculate the effect of the absolute Parrival picking error bins, we chose the RMSE, MAE, and Ï errors of different picking error bins when \({t}_{1}\) is at 3Â s and evaluated the effect based on the increased ratio of the error (RMSE, MSE and Ï), as shown in Fig.Â 7aâ€“c. The larger ratio means a higher increase in error levels on magnitude estimation which mean decreased performance. We calculate the ratio by setting the MAE (RMSE or Ï) of the manual picks as the denominator, and the molecule is the error difference between the absolute picks error bin and the manual picks. The effects of the absolute Parrival picking error bins are reflected similarly in the RMSEs, MAEs, and Ï. Thus, we take the MAEs when \({t}_{1}\) is 3Â s as an example (Fig.Â 7b). The MAE with 0.0â€‰~â€‰0.1Â s absolute picking error bin range is 0.30, approximately the same as the MAE with manual picks. The MAE with 0.1â€‰~â€‰0.2Â s absolute picking error bin range is 0.35 (20.70% increase of the MAE with manual picks), which could be considered a slightly higher increase in error level. To the 0.2â€‰~â€‰1.0Â s absolute picking error bins, the MAEs are greater than 0.40, and the error level significantly increased, at least 37.90%. Thus, by combining the error curves with \({t}_{1}\) (Fig.Â 7dâ€“f), the model could obtain a robust magnitude estimation when the Pwave arrivals picking errors are less than 0.2Â s. Such accuracy is generally achievable with recently proposed picking methods (Mousavi et al. 2020).
Prospects
To better illustrate our proposed modelâ€™s magnitude estimation of the study region and develop its performance, we analyzed the spatial distribution of the magnitude prediction errors on the testing data set with time (\({t}_{1}\)) at 1, 3, 5, 6, 7, and 8Â s when the earthquakeâ€™s triggered seismometers exceed one (as shown in Fig.Â 8). For the total study region when \({t}_{1}\) is at 3Â s (Fig.Â 8b), most of the prediction errors of the earthquakes that occurred in the inland region are greater than zero and those of the earthquakes that occurred in the offshore region are less than zero, withinâ€‰Â±â€‰0.5 range. With \({t}_{1}\) increases (Fig.Â 8câ€“f), the prediction errors of the inland and offshore have a tendency closer to zero, but slight overestimation and underestimation trends still exist. The phenomenon may mainly be related to the distances (epicenter distance and depth) and the azimuths of the varying triggered seismometers. The triggered seismometers from the offshore regionâ€™s earthquake are generally farther, and the earthquakes have greater depth, in which the distances influence the amplitude of the realtime waveforms to lead to variations in magnitude estimation (Mousavi and Beroza 2020a). The influence of the azimuths is mainly reflected in two aspects: (1) the azimuths from offshore earthquakes are relatively simpler with sparse seismometers may bring uncertainty on Parrivalbased earthquake location (Saad et al. 2022b) and also introduce uncertainty on the implicit distance information provided by our proposed model; (2) the directivity related to the azimuths would also influence the amplitude of the realtime waveforms with varying distances. FigureÂ 8a shows the influence more clearly than Fig.Â 8bâ€“f; that is, most prediction errors exceeding 1.0 are close to the epicenter withâ€‰âˆ†â‰¤â€‰10Â km, and most underestimated prediction errors of the earthquakes occurred offshore when \({t}_{1}\) is at 1Â s and only one seismometer is triggered. The directivity and distance influence could generally be decreased with more triggered seismometers, as shown in Fig.Â 8bâ€“f.
To explore the detailed distance influence on the magnitude estimation of our proposed model, we selected the subset region of the testing data set (latitude from 35Â°N to 40Â°N and longitude from 139Â°E to 144Â°E), which contains massive earthquakes that occurred in the subduction. There are 399 earthquakes from inland with \({M}_{JMA}\) ranging from 3.0 to 6.3. The depths and epicenter distances are less than 151Â km and 100Â km. For the offshore, 588 earthquakes with \({M}_{JMA}\) from 3.0 to 7.4, the depth ranges from 0 to 90Â km, and the epicenter distance is less than 200Â km. Considering the potential difference of earthquakes among the crustal, subduction, and upper mantle and highlighting the epicenter distance influence as much as possible, we roughly class the offshore earthquakes into three depth segments based on the earthquake location classification methods (Zhao et al. 2015). The depth segments are 0 to 25Â km (the crustal), 25 to 70Â km (mainly focus on the subduction), and 70 to 160Â km (the upper mantle). We also divided the inland earthquakes into three depth segments. The detailed distribution of the earthquakes can be found in the Additional file 1: Figure S1.
Considering the epicenter distance variation with more triggered seismometers when \({t}_{1}\) increases, it is challenging to quantitatively show the distribution between the prediction errors and the epicenter distances. Thus, we qualitatively analyze the epicenter distance influence on the magnitude prediction errors of the proposed model. FigureÂ 9 shows the spatial distribution of the magnitude prediction errors of the proposed model from three depth segments in the subset region when \({t}_{1}\) is at 1 and 3Â s. As shown in Figs.Â 8, 9 indicates a more significant epicenter distance influence on the magnitude estimation prediction error of the proposed model, such as the prediction error results when \({t}_{1}\) is at 1Â s and the depth ranges from 0 to 25Â km (Fig.Â 9a). The prediction errors for earthquakes from 25 to 70Â km depth segment are generally less than 0.0 with farther epicenter distance (Fig.Â 9b, e). However, several points with prediction error greater than 0.0 with farther epicenter distance exist. Apart from the potential magnitude saturation and epicenter distance influence, the depth distance influences should also be considered. Meanwhile, we noticed that most prediction errorsâ€‰<â€‰0 from inland earthquakes with a depthâ€‰â‰¥â€‰70Â km (Fig.Â 9c, f).
To analyze the depth influence on the magnitude estimation of our proposed model, we chose the distribution between the prediction error and depth when \({t}_{1}\) is from 1 to 8Â s. Based on the error bars of the prediction errors corresponding to three depth segments, the prediction errors have an overall decreased trend with depth increases, and the trend weakens with \({t}_{1}\) increases, as shown in Fig.Â 10. The results show that the magnitude estimation of the proposed model varies with depth variation, in which the depth also influences the amplitude of the realtime waveforms. The mean of prediction errors for the offshore region with 25â€“70Â km depth is slightly less than the other two depth segments, especially when \({t}_{1}\) â‰¤ 3Â s. Two aspects may mainly cause the phenomenon: (1) Compared with the other two depth segments, the 25 to 70Â km depth segment contains many earthquakes that occurred on the subduction, and the earthquakes have farther seismometers, as shown in Fig. 9 and Additional file 1: Figure S1. Thus, the earthquakes have higher distance levels, and the distance influence leads to a lower mean value of the earthquakes from the depth segment from 25 to 70Â km. (2) The 25â€“70Â km depth segment contains more earthquakes with a magnitudeâ€‰â‰¥â€‰5.5 than the other two. At the initial of the earthquakes, the earthquake rupture progress and duration may influence the mean prediction errors. Moreover, we will investigate how the azimuthal distribution of the seismometers will affect the magnitude estimation in the future.
To deal with the above issue of the different magnitude estimation performance for earthquakes that occurred inland and offshore regions, there are some potential ways, e.g., introducing more additional information related to distance and azimuth (Iwata et al. 2015; Noda et al. 2012), introducing a simple and rough twoclass structure similar to the classifier from Yang et al. (2024) to make the model distinguish the earthquakes occurred region and expect the model automatically correct the difference. Moreover, we also notice that the subset region inland has slightly higher error levels relative to the whole region at the initial of the earthquake, such as the subset region (139Â°E/144Â°E/35Â°N/40Â°N), which is relative to the earthquake location distribution (occurred on the crustal or upper mantle). For these subset regions, we will further investigate the potential region difference, and the model could be more suitable for these regions by transferring the current model using the earthquakes from the specific regions. Since the model has a relatively simple structure and a small volume of parameters, it does not seem difficult to transfer.
Conclusions
In this study, we designed a deeplearning network to estimate earthquake magnitude automatically, using three multimodalities: realtime acceleration waveforms, differential Pwave arrivals, and differential locations from multiple seismometers per earthquake. We adopt a specific architecture to introduce the distance information on the magnitude estimation by automatically processing the variety of hypocenter location estimation. We trained and selected the model based on the training and validation data sets. We utilized the testing data set to evaluate the model on magnitude estimation performance. Compared with the error analysis from the classical and CNNbased methods, the proposed model performs less error level on magnitude estimation, especially on the high magnitude. The comparison of the results between model ii and the proposed model provides evidence that the specific architecture could introduce the distance information on magnitude estimation. In addition, the Parrivals picking error testing indicates the model has robustness on EEW with absolute error less than 0.2Â s.
Availability of data and materials
Seismic waveforms were provided by the Japanese National Research Institute for Earth Science and Disaster Resilience (NIED; http://www.kyoshin.bosai.go.jp/, last accessed December 2023), Tsukuba, Japan. The figures were made using Genetic Mapping Tools (Wessel et al. 2013). The magnitude estimation modelâ€™s codes were written with Python packages TensorFlow (www.tensorflow.org) and Sklearn (https://scikitlearn.org/stable/) (Pedregosa et al. 2011).
Abbreviations
 CNN:

Convolutional Neural Network
 EEW:

Earthquake Early Warning
 âˆ†:

Epicenter distance
 FC:

Fully Connected Layer
 GAP:

Global Average Pooling Layer
 L:

The differential seismometer locations
 LSTM:

LongÂ shortterm Memory
 M_{JMA} :

Japan Meteorological Agency Magnitude
 MAE:

Mean absolute error
 MSE:

Mean square error
 Ï¬:

Standard Deviation Error
 Relu:

Rectified linear unit
 RMSE:

Root mean square error
 SNR:

Signaltonoise ratio
 T:

The differential Pwave arrivals
 \({t}_{1}\) :

The passing time after the first Pwave arrival per earthquake
 \({t}_{i}\) :

The waveform length after Pwave arrival recorded on the \({i}^{th}\) seismometer when t_{1}
 \({t}_{noise}\) :

The noise length before Pwave arrival recorded on the \({i}^{th}\) seismometer
 \({\Delta t}_{i}\) :

The travel time between the \({i}^{th}\) Pwave arrival seismometer and the earliest Pwave arrival seismometer
 \({\Delta lat}_{i}\) :

The numerical latitude difference between the \({i}^{th}\) and the earliest Parrival seismometer
 \({\Delta lon}_{i}\) :

The numerical longitude difference between the \({i}^{th}\) and the earliest Parrival seismometer
References
Aki K, Richards PG (2002) Quantitative seismology. University Science Books, Sausalito
BaltruÅ¡aitis T, Ahuja C, Morency LP (2018) Multimodal machine learning: a survey and taxonomy. IEEE Trans Pattern Anal Mach Intell 41(2):423â€“443. https://doi.org/10.1109/TPAMI.2018.2798607
Chai T, Draxler RR (2014) Root mean square error (RMSE) or mean absolute error (MAE)?â€“arguments against avoiding RMSE in the literature. Geosci Model Dev 7(3):1247â€“1250. https://doi.org/10.5194/gmd712472014
Colombelli S, Zollo A, Festa G, Picozzi M (2014) Evidence for a difference in rupture initiation between small and large earthquakes. Nat Commun. https://doi.org/10.1038/ncomms4958
Colombelli S, Festa G, Zollo A (2020) Early rupture signals predict the final earthquake size. Geophys J Int 223(1):692â€“706. https://doi.org/10.1093/gji/ggaa343
Festa G, Zollo A, Lancieri M (2008) Earthquake magnitude estimation from early radiated energy. Geophys Res Lett. https://doi.org/10.1029/2008GL035576
Funasaki J, Division EPI (2004) Revision of the JMA velocity magnitude (in Japanese). Quart J Seis 67:11â€“20
Gong Y, Chung YA, Glass J (2021) Ast: audio spectrogram transformer. arXiv Preprint. https://doi.org/10.4855/arXiv.2104.01778
Heidari R (2018) Ï„ps, A new magnitude scaling parameter for earthquake early warning. Bull Earthquake Eng 16:1165â€“1177. https://doi.org/10.1007/s105180170256x
Hochreiter S, Schmidhuber J (1997) Long shortterm memory. Neural Comput 9(8):1735â€“1780. https://doi.org/10.1162/neco.1997.9.8.1735
Iwata N, Yamamoto S, Korenaga M, Noda S (2015) Improved algorithms of seismic parameters estimation and noise discrimination in earthquake early warning. Q Rep RTRI 56(4):291â€“298. https://doi.org/10.2219/rtriqr.56.4_291
Kanamori H (2005) Realtime seismology and earthquake damage mitigation. Annu Rev Earth Planet Sci 33:195â€“214. https://doi.org/10.1146/annurev.earth.33.092203.122626
Katsumata A (2004) Revision of the JMA displacement magnitude (in Japanese). Quart J Seis 67:1â€“10
Kuang W, Yuan C, Zhang J (2021) Networkbased earthquake magnitude determination via deep learning. Seismol Res Lett 92(4):2245â€“2254. https://doi.org/10.1785/0220200317
Kuyuk HS, Allen RM (2013) A global approach to provide magnitude estimates for earthquake early warning alerts. Geophys Res Lett 40(24):6329â€“6333. https://doi.org/10.1002/2013GL058580
Lahat D, Adali T, Jutten C (2015) Multimodal data fusion: an overview of methods, challenges, and prospects. Proc IEEE 103(9):1449â€“1477. https://doi.org/10.1109/JPROC.2015.2460697
Liu D, Wang Z, Wang L, Chen L (2021) Multimodal fusion emotion recognition method of speech expression based on deep learning. Front Neurorobot. https://doi.org/10.3389/fnbot.2021.697634
Lomax A, Michelini A, JozinoviÄ‡ D (2019) An investigation of rapid earthquake characterization using singlestation waveforms and a convolutional neural network. Seismol Res Lett 90(2A):517â€“529. https://doi.org/10.1785/0220180311
Luo H, Ji L, Zhong M, Chen Y, Lei W, Duan N, Li T (2022) Clip4clip: An empirical study of clip for end to end video clip retrieval and captioning. Neurocomputing 508:293â€“304. https://doi.org/10.1016/j.neucom.2022.07.028
Meier MA, Ross ZE, Ramachandran A, Balakrishna A, Nair S, Kundzicz P, Li Z, Andrews J, Hauksson E, Yue Y (2019) Reliable realtime seismic signal/noise discrimination with machine learning. J Geophys Res Solid Earth 124(1):788â€“800. https://doi.org/10.1029/2018JB016661
Moriwaki K (2017) Automatic detection of lowfrequency earthquakes in southwest Japan using MatchedFilter technique. Quart J Seis 81:3 (in Japanese with English abstract). https://earthplanetsspace.springeropen.com/articles/10.1186/s4062301809154
Mousavi SM, Beroza GC (2020a) A machinelearning approach for earthquake magnitude estimation. Geophys Res Lett. https://doi.org/10.1029/2019GL085976
Mousavi SM, Beroza GC (2020b) Bayesiandeeplearning estimation of earthquake location from singlestation observations. IEEE Trans Geosci Remote Sens 58(11):8211â€“8224. https://doi.org/10.1109/TGRS.2020.2988770
Mousavi SM, Zhu W, Sheng Y, Beroza GC (2019) CRED: a deep residual network of convolutional and recurrent units for earthquake signal detection. Sci Rep. https://doi.org/10.1038/s41598019457481
Mousavi SM, Ellsworth WL, Zhu W, Chuang LY, Beroza GC (2020) Earthquake transformerâ€”an attentive deeplearning model for simultaneous earthquake detection and phase picking. Nat Commun. https://doi.org/10.1038/s4146702017591w
MÃ¼nchmeyer J, Bindi D, Leser U, Tilmann F (2021) Earthquake magnitude and location estimation from real time seismic waveforms with a transformer network. Geophys J Int 226(2):1086â€“1104. https://doi.org/10.1093/gji/ggab139
Nagrani A, Yang S, Arnab A, Jansen A, Schmid C, Sun C (2021) Attention bottlenecks for multimodal fusion. Adv Neural Inf Process Syst 34:14200â€“14213
Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. In: FÃ¼rnkranz J, Joachims T (eds). ICMLâ€™10: topics in artificial intelligence. 27th International Conference on Machine Learning, Haifa, June 2010. Papers in machine learning research, Omnipress, p 807â€“814.
National Research Institute for Earth Science and Disaster Resilience (2019) NIED KNET, KiKnet. National Research Institute for Earth Science and Disaster Resilience. https://doi.org/10.17598/nied.0004
Noda S, Yamamoto S, Sato S (2012) New method for estimating earthquake parameters for earthquake early warning. Q Rep RTRI 53(2):102â€“106. https://doi.org/10.2219/rtriqr.53.102
Okada Y, Kasahara K, Hori S, Obara K, Sekiguchi S, Fujiwara H, Yamamoto A (2004) Recent progress of seismic observation networks in Japanâ€”Hinet, Fnet, KNET and KiKnetâ€”. Earth Planets Space 56:xvâ€“xxviii
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, MÃ¼ller A, Nothman J, Louppe G, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay Ã‰ (2011) Scikitlearn: machine learning in python. J Mach Learn Res 12:2825â€“2830
Perol T, Gharbi M, Denolle M (2018) Convolutional neural network for earthquake detection and location. Sci Adv. https://doi.org/10.1126/sciadv.1700578
Radford A, Kim JW, Hallacy C, Ramesh A, Goh G, Agarwal S, Sastry G, Askell A, Mishkin P, Clark J, Krueger G, Sutskever I (2021) Learning transferable visual models from natural language supervision. In: Meila M, Zhang T (eds) ICMLâ€™21: topics in artificial intelligence. 38th International Conference on Machine Learning, Virtual, July 2021. Papers in machine learning research, vol 139, PMLR, Virtual, p 8748â€“8763.
Reynen A, Audet P (2017) Supervised machine learning on a network scale: Application to seismic event classification and detection. Geophys J Int 210(3):1394â€“1409. https://doi.org/10.1093/gji/ggx238
Richter CF (1935) An instrumental earthquake magnitude scale. Bull Seismol Soc Am 25(1):1â€“32. https://doi.org/10.1785/BSSA0250010001
Saad OM, Chen Y, Savvaidis A, Fomel S, Chen Y (2022a) Realtime earthquake detection and magnitude estimation using vision transformer. J Geophys Res Solid Earth. https://doi.org/10.1029/2021JB023657
Saad OM, Chen Y, Trugman D, Soliman MS, Samy L, Savvaidis A, Khamis MA, Hafez AG, Fomel S, Chen Y (2022b) Machine learning for fast and reliable sourcelocation estimation in earthquake early warning. IEEE Geosci Remote Sens Lett 19:1â€“5. https://doi.org/10.1109/LGRS.2022.3142714
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929â€“1958
Trugman DT, Page MT, Minson SE, Cochran ES (2019) Peak ground displacement saturates exactly when expected: implications for earthquake early warning. J Geophys Res Solid Earth 124(5):4642â€“4653. https://doi.org/10.1029/2018JB017093
Tsuboi C (1954) Determination of the GutenbergRichterâ€™s magnitude of earthquakes occurring in and near Japan. Zisin, II 7:185â€“193
Van Den Ende MPA, Ampuero JP (2020) Automated seismic source characterization using deep graph neural networks. Geophys Res Lett. https://doi.org/10.1029/2020GL088690
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin, I (2017) Attention is all you need. In Isabelle G, Ulrike VL, Samy B, Hanna MW, Rob F, SVN Vishwanathan, Roman G (eds) NIPSâ€™17: Advances in Neural Information Processing Systems. 30th Annual Conference on Neural Information Processing Systems, Long Beach, December 2017. Papers in Advances in Neural Information Processing Systems 30 (NIPS 2017), Virtual, p 5998 â€“ 6008. https://dblp.org/db/conf/nips/nips2017.html The paper's BibTeX link is: https://dblp.org/rec/conf/nips/VaswaniSPUJGKP17.html?view=bibtex
Wang Y, Li S, Song J (2021a) Magnitudescaling relationships based on initial Pwave information in the Xinjiang region. China J Seismol 25(2):697â€“710. https://doi.org/10.1007/s1095002009981w
Wang M, Xing J, Liu Y (2021b) Actionclip: a new paradigm for video action recognition. arXiv Preprint. https://doi.org/10.4855/arXiv.2109.08472
Wessel P, Smith WHF, Scharroo R, Luis J, Wobbe F (2013) Generic mapping tools: improved version released[J]. Eos Trans Am Geophys Union 94(45):409â€“410. https://doi.org/10.1002/2013EO450001
Wu YM, Zhao L (2006) Magnitude estimation using the first three seconds Pwave amplitude in earthquake early warning. Geophys Res Lett. https://doi.org/10.1029/2006GL026871
Xiong R, Yang Y, He D, Zheng K, Zheng S, Xing C, Zhang H, Lan Y, Wang L, Liu T (2020) On layer normalization in the transformer architecture. In Hal DaumÃ© III, Singh A (eds) ICMLâ€™20: topics in artificial intelligence. 37th International Conference on Machine Learning, Virtual, July 2020. Papers in machine learning research, vol 119, PMLR, Virtual, p 10524â€“10533.
Yamada M, Mori J (2009) Using Ï„c to estimate magnitude for earthquake early warning and effects of nearfield terms. J Geophys Res Solid Earth. https://doi.org/10.1029/2008JB006080
Yamada M, Tamaribuchi K, Wu S (2021) The extended integrated particle filter method (IPFx) as a highperformance earthquake early warning system. Bull Seismol Soc Am 111(3):1263â€“1272. https://doi.org/10.1785/0120210008
Yang C, Zhang K, Chen G, Pan Y, Zhang L, Qu L (2024) Application of machine learning to determine earthquake hypocenter location in earthquake early warning. IEEE Geosci Remote Sens Lett. https://doi.org/10.1109/LGRS.2023.3348107
Zhao JX, Zhou S, Gao P, Long T, Zhang Y, Thio HK, Lu M, Rhoades DA (2015) An earthquake classification scheme adapted for Japan determined by the goodness of fit for groundmotion prediction equations. Bull Seismol Soc Am 105(5):2750â€“2763. https://doi.org/10.1785/0120150013
Zhu W, Beroza GC (2019) PhaseNet: a deepneuralnetworkbased seismic arrivaltime picking method. Geophys J Int 216(1):261â€“273. https://doi.org/10.1093/gji/ggy423
Zhu H, Wang Z, Shi Y, Hua Y, Xu G, Deng L (2020) Multimodal fusion method based on selfattention mechanism. Wirel Commun Mob Comput. https://doi.org/10.1155/2020/8843186
Acknowledgements
Thanks to the Japanese National Research Institute for Earth Science and Disaster Resilience providing the data. Thanks to Ye Liu and Yuan Wang for providing help with the study. Thanks to MÃ¼nchmeyer et al. (2021), opening source their codes.
Funding
This research was financially supported by the National Natural Science Foundation of China (U2039209, 42304074 and 51408564).
Author information
Authors and Affiliations
Contributions
Conceptualization, B.H., S.L. and J.S.; methodology, B.H.; software, B.H.; validation, B.H., Y.Z. and Y.X.; formal analysis, B.H., S.L. and J.S.; investigation, B.H.; resources, B.H., S.L. and J.S.; data curation, Y.Z., Y.X. and J.S.; writingâ€”original draft preparation, B.H.; writingâ€”review and editing, B.H., S.L. and J.S.; visualization, B.H. and J.S.; supervision, S.L. and J.S.; project administration, S.L. and J.S.; funding acquisition, S.L. and J.S.. All authors have read and agreed to the published version of the manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Additional file 1: Figure S1.
Earthquake map and data distribution of the subsetregion. The subfigure a indicates the distribution of the seismometers and earthquake epicenters. Different colors indicate the earthquake depth range and different shapes indicate different geographical regions. Subfigure b indicates the distribution between depths and magnitudes. Subfigure c indicates the distribution between the depths and epicenter distances. The subfigures d to e indicate the frequency distribution of depths and epicenter distances. The different colors indicate different geographical regions.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Hou, B., Zhou, Y., Li, S. et al. Realtime earthquake magnitude estimation via a deep learning network based on waveform and text mixed modal. Earth Planets Space 76, 58 (2024). https://doi.org/10.1186/s40623024020058
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s40623024020058