Recent advances in earthquake seismology using machine learning

Given the recent developments in machine-learning technology, its application has rapidly progressed in various fields of earthquake seismology, achieving great success. Here, we review the recent advances, focusing on catalog development, seismicity analysis, ground-motion prediction, and crustal deformation analysis. First, we explore studies on the development of earthquake catalogs, including their elemental processes such as event detection/classification, arrival time picking, similar waveform searching, focal mechanism analysis, and paleoseismic record analysis. We then introduce studies related to earthquake risk evaluation and seismicity analysis. Additionally, we review studies on ground-motion prediction, which are categorized into four groups depending on whether the output is ground-motion intensity or ground-motion time series and the input is features (individual measurable properties) or time series. We discuss the effect of imbalanced ground-motion data on machine-learning models and the approaches taken to address the problem. Finally, we summarize the analysis of geodetic data related to crustal deformation, focusing on clustering analysis and detection of geodetic signals caused by seismic/aseismic phenomena.


Introduction
Recently, machine learning (ML) technology, including deep learning (DL), has made remarkable progress in various scientific fields, including earthquake seismology, producing vast research findings.Here, we review the applications of ML in several fields of earthquake seismology and discuss the strengths and difficulties of using ML.We focus on ML applications in earthquake catalog development, seismicity analysis, ground-motion prediction, and geodetic data related to crustal deformation.Figure 1 summarizes the topics reviewed in this paper.Notably, our focus is on exploring the applications of ML in individual fields rather than providing a comprehensive review of ML applications in the entire field of study.Excellent review articles have been provided for the reader's reference on the recent applications of ML in solid earth geosciences by Bergen et al. (2019), seismology by Kong et al. (2019) and Mousavi andBeroza (2022, 2023), microseismic monitoring with small signals by Li (2021) and Anikiev et al. (2023), and analysis of Global Navigation Satellite System (GNSS) data by Siemuri et al. (2022).

Application in earthquake catalog development
The technique of detecting seismic waveforms, determining hypocenters, and cataloging them from seismogram records is an area driving the use of ML in seismology.This section reviews the recent developments in ML applications related to earthquake cataloging, focusing on improving individual tasks in its pipeline, similar waveform searching, focal mechanism analysis, and paleoseismic record analysis.

Importance of improving earthquake cataloging methods and contribution of machine learning techniques
Considering that the earthquake size distribution follows a power law, improving event detectability drastically increases the number of analyzable events, enhancing the spatiotemporal resolution in seismicity analysis.Some studies have emphasized the role of small events in the earthquake-generation process, highlighting the importance of making complete earthquake catalogs.For example, Mignan (2014) conducted a meta-analysis of 37 foreshock studies and found that interpretation of foreshock activity depends on the completeness magnitude of the used earthquake catalogs.Trugman and Ross (2019) showed that precursory seismicity is more ubiquitous than previously understood in southern California using a highly complete catalog.Naoi et al. (2015) reported that on-fault seismicity comprises only very small (M < -2) events in a gold mine.The technology to produce high-quality seismic catalogs from seismic waveforms contributes to achieving a smaller completeness magnitude, as well as improving station density and seismometer sensitivity.
To detect small seismic events with waveforms buried in noise, the template matching technique (Gibbons and Ringdal 2006;Shelly et al. 2007;Peng and Zhao 2009), whereby additional events with waveforms similar to template waveforms are explored, has frequently been used.This technique often reduces the completeness magnitude by approximately 1, resulting in an increase in the number of detected events by an order of magnitude (Ross et al. 2019a).Although this approach works well, its application is usually limited to shortperiod data (e.g., preceding large earthquakes) owing to the high computing costs.Parallel computation using many Graphics Processing Units (GPUs) enables the application of this method to a larger dataset (Ross et al. 2019a); however, such an environment is not readily available, resulting in the rarity of such cases.
The application of ML techniques to efficiently develop high-quality earthquake catalogs has rapidly progressed since approximately 2018.The application of DL to event detection and arrival time reading problems is driving this study area, and its performance has continuously improved by adopting new architectures and modules inspired by rapid advancements in the ML field.ML provides more accurate and noise-resistant auto-processing methods than conventional automatic processing, resulting in highly complete earthquake catalogs.Similar to template matching, the recent ML-based method often increases the number of events in catalogs by tenfold compared with conventional approaches (Tan et al. 2021;Ma and Chen 2022) with a much lower computation cost (Perol et al. 2018).By benefiting from the developments in this field, many studies exploring seismicity with high resolution have been published (Liu et al. 2020;Park et al. 2020;Ross et al. 2020;Wang et al. 2020a;Baker et al. 2021;Li et al. 2021;Jiang et al. 2022;Gong et al. 2022a;Chen et al. 2022a).This improvement in catalog quality is expected to improve earthquake forecasts based on seismicity models (Mancini et al. 2022).
The models trained on large data sets in some of these studies are publicly available (Ross et al. 2018b;Zhu and Beroza 2019;Mousavi et al. 2020), making it easy to benefit from this development.These models are versatile (Cianetti et al. 2021) and can be applied to many datasets.Even when the scale of the target earthquakes or the observation frequency bands differs significantly from the training data, transfer learning and fine-tuning (Pan and Yang 2010;Bozinovski 2020), in which an ML model trained on one task is recalibrated for a related task, helps analyze data (Titos et al. 2020;Chai et al. 2020;Lara et al. 2021;Lapins et al. 2021;Kim et al. 2023), likely due to the self-similarity of earthquakes.These techniques also enable the application of ML to project data where a large amount of training data is difficult to prepare.The application of ML has also been reported in cataloging development problems related to acoustic emissions (AEs) in laboratories (Trugman et al. 2020;Li et al. 2022c).Many studies have published ML-based packages that handle the entire catalog development procedure throughout the event detection from continuous waveform records, arrival time picking, phase associations, and hypocenter locations (Walter et al. 2021;Zhu et al. 2022c, a;Retailleau et al. 2022;Shi et al. 2022;Zhang et al. 2022c;Dokht et al. 2022).An environment to use ML for seismicity analysis is being developed as a basic technology in seismology.

Improvement of individual tasks in cataloging pipelines
Developing an earthquake catalog from seismic waveform data is typically divided into the following steps: (1) event detection from continuous waveform record; (2) arrival time reading; (3) phase association; and (4) hypocenter determination (Fig. 2).This is simple modeling, and each process can be skipped or integrated.For example, event detection is often skipped, and phase picking is performed directly on continuous data.Moreover, to improve performance, a denoising phase (Tibi et al. 2022) can be added before the detection process, and a quality control phase (Tamaribuchi et al. 2021) can be added after the hypocenter determination.
Application of ML techniques has been attempted in most of these processes: their performance significantly outperforms conventional methods.Several models trained on regional or global datasets are publicly available, especially for event detection and arrival time picking problems.These performances have been systematically compared (Cianetti et al. 2021;Münchmeyer et al. 2022;García et al. 2022), and some studies opened datasets for benchmarking the model performance (Mousavi et al. 2019a;Magrini et al. 2020;Michelini et al. 2021;Woollam et al. 2022).Mai and Audet (2022) developed software to generate labeled datasets, while Woollam et al. (2022) developed a common application programming interface to access various ML models in those tasks.

Event detection/classification and arrival time picking
Certain algorithms, including the ratio between shortterm average (STA) and long-term average (LTA) (Allen 1978), have been conventionally used to extract transient signals from a continuous seismic record.The ML technique solves this problem with higher performance, and the application can be expanded to extract many types of signals.
The event detection task can be treated as a problem of classifying cutout seismic waveforms into two (typically Fig. 2 Typical pipeline for earthquake catalog development earthquakes and noises) or more categories.This issue has been addressed by various ML algorithms, including support vector machine, decision tree, logistic regression, and neural network, including DL (Reynen and Audet 2017;Meier et al. 2019;Tang et al. 2020;Albert and Linville 2020;Kim et al. 2021b;Chakraborty et al. 2022;Murti et al. 2022).The target of such classifiers can be expanded to various phenomena by preparing appropriate training data.For example, it can be used to classify/identify surface waves (Chai et al. 2022), volcanic earthquakes (Bueno Rodriguez et al. 2022;Canário et al. 2020;Lara et al. 2021), moonquakes (Civilini et al. 2021), low-frequency earthquakes (Nakano et al. 2019;Rouet-Leduc et al. 2020;Thomas et al. 2021;Takahashi et al. 2021;Chen et al. 2023), and mining-induced earthquakes/bastings (Linville et al. 2019;Tibi et al. 2019;Peng et al. 2021;Wang et al. 2023b).Applications in distributed acoustic sensing, which require efficient processing of large amounts of data, have also been reported (Hernandez et al. 2022).Researchers have used simple Convolutional Neural Network (CNN) models (Zhang et al. 2020a;Majstorović et al. 2021) and the latest architecture/modules, such as Recurrent Neural Network (RNN) (Mousavi et al. 2019c), attention (Hou et al. 2023), transformer (Mousavi et al. 2020), multi-feature fusion networks (Kim et al. 2021a), graph-partitioning based CNN (Yano et al. 2021), Capsule Neural Network (Saad and Chen 2022), Residual Neural Network (ResNet) (Li et al. 2022b) and inceptions (Jia et al. 2022), contributing to improved performance.Moreover, some studies have aimed to enhance the interpretability of DL, often treated as black boxes (Kong et al. 2022;Majstorović et al. 2022).
The form of the input/output data of such classifiers can be designed flexibly.For example, one may design a classifier to output a scalar value corresponding to the classification result for a cutout waveform (Zhang et al. 2020a).Applying this approach to moving windows (Ross et al. 2018b;Dokht et al. 2019;Zhu et al. 2019a) enables the generation of a time series for the probability of a seismic signal.An architecture that outputs vectors with the same number of samples as input waveforms (Mousavi et al. 2020) allows for determining each data point's probability.This likely offers advantages when target events have widely varying durations or when many events occur frequently in a short period.Given that DL techniques have been developed for image processing, many studies have used two-dimensional data with time-frequency representation as input data (Dokht et al. 2019;Mousavi et al. 2019c;Lara et al. 2021;Saad et al. 2021).Nevertheless, one-dimensional CNNs have also been used frequently owing to their simple pre-processing of waveform data and low computational cost (Nakano and Sugiyama 2022).In addition to the simplest case using a single-channel waveform at a single station, multi-channel or multi-station waveforms can easily be inputted into a model.When using a singlechannel waveform, a large amount of training data can be prepared more easily, and a model independent from specific observation settings can be obtained, resulting in highly versatile models.When using multi-channel or multi-station data as input, models will likely achieve high detectability and robustness for local noises (Yang et al. 2021a;Yano et al. 2021).Another way to use multistation data for event detection is to simultaneously estimate hypocenter locations with event detection by creating a classifier based on coherence images of characteristic functions estimated from waveforms (Mosher and Audet 2020).This approach is closely related to the topics discussed in the "Earthquake location" subsection.
Preparing training data is one of the most important tasks in developing a classifier using supervised learning.For example, when performing binary classification of earthquake and noise, it is necessary to prepare training data for both classes.Compared with preparing seismic event datasets, for which a vast amount of analysis results (including manual processing) has been accumulated, preparing noise waveform datasets with a wide variety of properties may be more difficult.A potential approach to solve this problem is waveform classification using unsupervised learning, which can be generalized as a clustering problem for seismometer records.Early studies in this field attempted clustering based on humanselected features with sufficiently reduced dimensions before inputting into ML models (Chamarczuk et al. 2020;Johnson et al. 2020).Recently, more effective clustering has been attempted by automatically extracting features directly from unprocessed datasets, such as seismic waveforms or their spectrograms (Seydoux et al. 2020;Yin et al. 2022b).This approach is used for various purposes, including the detection/classification of earthquakes (Seydoux et al. 2020;Steinmann et al. 2022a), quarry blasts and rockfalls (Hammer et al. 2013), volcanic earthquakes/tremors (Esposito et al. 2008;Unglert and Jellinek 2017;Soubestre et al. 2018;Ren et al. 2020;Ida et al. 2022), surface freezing and thawing (Steinmann et al. 2022b), reflection wave (Ali et al. 2022), and seismic waves radiated in glaciological processes (Jenkins et al. 2021).
Incidentally, in the initial process of earthquake catalog development, the analyst typically identifies a window involving a seismic waveform and then measures arrival time in a cutout waveform.However, this event detection phase can be skipped, resulting in the arrival time measurements being performed directly on the entire continuous waveforms (Park et al. 2020;Heck et al. 2022;Retailleau et al. 2023).The fraction of false detections of arrival time measurements and the efficiency of the subsequent phase association calculations likely determine which of the two approaches is effective.
Automatic arrival-time reading, for which the method using the autoregression model and Akaike Information Criteria (Takanami and Kitagawa 1988) is famous, could not achieve the accuracy of manual processing until the advent of methods based on supervised DL, i.e., neural phase pickers (Park et al. 2023), especially for S-wave arrival times.Although the specific problem of "prediction inconsistency" has been reported (Park et al. 2023), DL-based pickers outperform traditional algorithms and achieve picking accuracies similar to those of skilled analysts (Mousavi and Beroza 2023), and the published models have been used in many studies.Similar to event detection/classification problems, stateof-the-art modules and architectures, such as RNN (Zhou et al. 2019), attention (Liao et al. 2021(Liao et al. , 2022a;;Li et al. 2022a), transformer (Mousavi et al. 2020), and edge convolutional module (Feng et al. 2022b), were continuously incorporated into the models to improve their performance.In this approach, a model takes seismic waveforms as inputs and outputs the arrival times as scalar values (Ross et al. 2018a) or a time series of probability values with a peak at a picked arrival time (Zhu and Beroza 2019;Mousavi et al. 2020).Xiao et al. (2021) showed that the performance of such a model for low SN waveforms can be improved by learning the waveform similarities.Although most previous studies on arrival time reading problems were conducted on body waves, applications have begun to read subsequent waves, such as reflected waves, to use structure imaging (Ding et al. 2022;Kato 2023).
Similar to the event detection problem, arrival time reading models based on DL allow a flexible input/output data format.While some studies have used single-station waveforms (Ross et al. 2018a;Zhu and Beroza 2019;Woollam et al. 2019;Wang et al. 2019a;Zheng et al. 2020;Mousavi et al. 2020;Liao et al. 2021;Tokuda and Nagao 2023), others have applied multiple station records (Zhu et al. 2022c;Li et al. 2022c;Chen and Li 2022;Feng et al. 2022b;Sun et al. 2023).As both event detection and phase picking use seismic waveform as input data, they can be processed simultaneously in a single model using multitask learning, which has multiple outputs, including event and arrival time probabilities (Liao et al. 2022a;Mousavi et al. 2020).
A common challenge in event detection and arrival time reading based on supervised learning is the insufficient training data for large events, which naturally results from the power-law size distribution of earthquakes.When training data are generated from past observation data, attention should be paid to the performance for unknown events, which have locations, sizes, and focal mechanisms not occurring previously.Data augmentation can complement this problem, and effective augmentation methods were investigated by considering the characteristics of seismic data (Zhu et al. 2020).Zhang et al. (2021b) trained a phase-picking model on an augmented dataset based on only ten hydraulically induced events, demonstrating that the resultant model can create event catalogs from seismic records in different stages and projects.Additionally, Generative Adversarial Networks (GANs; Goodfellow et al. 2014) were attempted to generate training data (Li et al. 2020b;Wang et al. 2021) and perform feature extraction (Li et al. 2018).Marano et al. (2023) reviewed the use of GANs in seismology, including data augmentation.

Phase association
When all arrival time candidates in a specific time range are used to determine the hypocenters, the location accuracy often reduces significantly due to misidentification of the arrival time (false positives).This problem can be prevented in the location process by removing candidates with large residuals in iterations (Waldhauser and Ellsworth 2000;Naoi et al. 2018).For efficient analysis, some studies exclude false positives by introducing an additional process called "phase association", during which arrival time candidates are associated with a single earthquake.If the association is computationally efficient and accurate, the existence of many false positives is not problematic in the subsequent processing.This allows analysts to prefer "recall" to "precision" in the arrival time reading process, possibly resulting in a better catalog with a smaller number of missed events.Therefore, the performance of the phase association process is important in the earthquake cataloging process.
A popular association method that works well is calculating the theoretical arrival times for a given hypocenter and searching for the hypocenter with many compatible arrival time candidates using a grid search (Draelos et al. 2015).Recently, the REAL package (Zhang et al. 2019) efficiently processed this task and has, thus, been frequently used.ML-based methods have also been developed to address this issue.For example, Yu and Wang (2022) used a neural network to solve the grid search problem efficiently.The association methods based on RNN (Ross et al. 2019b), Graph Neural Networks (McBrearty andBeroza 2023), Bayesian Gaussian mixture models (Zhu et al. 2022b), and Physics-Informed Neural Networks (PINNs; Ross et al. 2023) have also been proposed.Methods based on binary classification of waveform pairs (McBrearty et al. 2019) and deep metric learning (Dickey et al. 2020) have been attempted to directly determine from seismic waveforms whether they are generated by the same earthquake.
Without the phase association process, the hypocenters can be directly determined by migrating/scanning the time series of arrival time probabilities, although the computational cost is high.This topic is discussed in the next subsection.

Earthquake location
After the arrival times of many stations are obtained for an earthquake, the hypocenter of the event can be inversely solved with high accuracy and small computing cost, as is generally performed in seismology (Geiger 1912).Some studies have to address this "travel timebased" location problem using ML techniques, including random forest (Saad et al. 2022;Chen et al. 2022c) and fully connected neural networks (Anikiev et al. 2022).These approaches are possibly effective for earthquake early warning (EEW), where hypocenters must be roughly estimated using a small number of arrival time candidates as early as possible (Saad et al. 2022).
Another approach to hypocenter determination is the "waveform-based" location method, which does not require phase picking and identifies the hypocenter based on migration and stacking of waveforms or characteristic functions calculated from waveforms at many stations.Although this approach involves high computation costs, it contributes to detecting small earthquakes with waveforms buried in noise (Li et al. 2020a).Compared with characteristic functions, such as STA/LTA (Grigoli et al. 2013) or kurtosis (Langet et al. 2014) time series used in previous studies, the probability trace estimated using a DL phase picker (Liao et al. 2022b) likely significantly reduces the false positive rate.This approach is often adopted in open packages that handle the entire cataloging process from continuous seismic records to hypocenter locations (Zhu et al. 2022c;Shi et al. 2022).Zhu et al. (2022c) incorporated this approach into a single DL model that takes the seismic waveform as input and outputs a hypocenter (end-to-end processing) to minimize discarded information during the cataloging procedure.Other studies have applied ML techniques to determine hypocenters from coherence/migration images.Zhang et al. (2022d) proposed a DL-based approach that used diffraction stacking images of seismic waveforms and solved a regression problem to determine the hypocenter from these images.Wu et al. (2019) proposed a method to determine hypocenters using deep reinforcement learning by reconstructing a threedimensional energy field constructed from observed waveforms.
As described above, in the hypocenter determination problem, the optimal hypocenter is generally searched using arrival times or related information (e.g., characteristic function) extracted from the waveforms at multiple stations.However, the later phases of the seismic waveform contain additional information on Green's function between the hypocenter and station.If such information could be used effectively, hypocenters can be determined based even on single-station data.As a simple and conventional approach, a hypocenter can be roughly determined by estimating the direction of the ray path that can be estimated from the P-wave-particle motion and hypocentral distance that can be estimated from the arrival time difference between different phases (e.g., P and S phases) (Frohlich and Pulliam 1999).ML approaches are expected to improve the performance using information discarded in the conventional method.Gutierrez et al. (2019) estimated the azimuth direction of incident seismic waves using support vector machines.Mousavi and Beroza (2020) and Ristea and Radoi (2022) estimated hypocenter locations or related parameters using DL from a single station waveform.Perol et al. (2018) proposed ConvNetQuake, which roughly determines hypocenters from a single station waveform by solving the classification task for the divided analysis regions using DL; this method has been improved in subsequent studies (Lomax et al. 2019;Tous et al. 2020;Bai and Tahmasebi 2021).
As a natural consequence of the above discussion, the ideal way to determine the hypocenter is to use as much information as possible from the full seismic waveforms at multiple stations.To achieve this, Kriegerowski et al. (2019), Zhang et al. (2020b), Shen andShen (2021), andMünchmeyer et al. (2021) developed DL models to determine source parameters from seismic waveforms at multiple channels using the waveforms of the past events as training data.The introduction of a graphbased approach improved the performance of these tasks (van den Ende and Ampuero 2020; Zhang et al. 2022e).Piras et al. (2022) used a Bayesian approach to estimate a hypocenter from the observed spectra of seismic waveforms.They used a neural network to map spectralbased features from the hypocenter for the posterior distribution sampling; this is prohibitively expensive using conventional methods, especially under a threedimensional heterogeneous density-velocity model.Although these methods work well, observation data are generally incomplete, limiting the effectiveness for events with hypocenter locations and focal mechanisms not included in the training data.To overcome this challenge, some studies have attempted training based on synthetic waveforms (Wang et al. 2022c;Sugiyama et al. 2021;Wamriew et al. 2022;Vinard et al. 2022), including data generated with semi-supervised GAN (Feng et al. 2022a).
To address the hypocenter determination problem, attempts have been made to use PINNs, which incorporate theoretical constraints, including physical laws in loss functions (Smith et al. 2021;Izzatullah et al. 2022).This additional constraint is advantageous when analyzing unknown data (not included in the training data).PINNs are trained to solve supervised learning tasks while satisfying any physical law described by partial differential equations (PDEs) by embedding the PDEs into the loss function of a neural network using automatic differentiation (Raissi et al. 2019;Karniadakis et al. 2021).PINNs can represent continuous solutions of PDE without discretization and can be applied to various types of forward and inverse problems (e.g., seismic tomography) (Chen et al. 2022b;Agata et al. 2023); consequently, they have received significant attention in science, including geophysics.

Quality control
For the hypocenters obtained using the pipelines described above, the solution quality is generally controlled by thresholds for some parameters, including the arrival time residuals in the inversion, to ensure that the false positive rate is sufficiently small.Ideally, this strategy can separate correct and incorrect solutions; however, it usually results in many false positives and false negatives.The tradeoff between the false positive rate and the number of solutions is possibly an obstacle to developing a high-quality catalog.Nonlinear classification using ML can potentially solve this problem.For example, Tamaribuchi et al. (2021Tamaribuchi et al. ( , 2023) ) used ensemble learning to exclude false positives from hypocenters obtained by automatic processing routinely operated by the Japan Meteorological Agency.Tamaribuchi et al. (2023) also used the CNN model by Ross et al. (2018b) to screen automatic phase-picking results.Beaucé et al. (2019) and Herrmann et al. (2019) used ML to remove false positives from energy-based seismic detection results and event candidates detected by matched filter analysis, respectively.

Similar waveform searching
As mentioned previously, cross-correlationbased techniques such as template matching and autocorrelation methods, which explore similar waveforms in continuous seismic records, increase the number of available events (Gibbons and Ringdal 2006;Shelly et al. 2007;Peng and Zhao 2009;Kato et al. 2012;Chamberlain et al. 2018).These methods effectively extract small signals buried in noise and are vital in the analysis of seismic activity.However, these methods require cross-correlation calculation among all combinations of the analyzed event waveforms, resulting in high computational costs.For example, the computational cost of autocorrelation analysis, in which brute-force calculations are performed on continuous waveform records, is O(N 2 ), limiting the analyzable period.Even for the template-matching problem, large-scale dataset requires the parallel computation of many GPUs (Ross et al. 2019a).Furthermore, template matching using synthetic or empirical synthetic waveforms (Chamberlain and Townend 2018;Ide 2021) potentially contributes to better catalog development.Hence, an efficient similar waveform search is an important technique in catalog development.
The ML technique potentially reduces the computational cost in similar waveform searches.For example, Skoumal et al. (2016) proposed the Repeating Signal Detector method, which detects seismic signals through unsupervised learning and enables template matching without template preparation before applying the method or brute force calculation.Ganter et al. (2018) proposed a similar waveform search based on ML to generate templates satisfying the alternate null hypothesis, which incorporates the possibility of false positives.Kumar et al. (2022) suggested that using the dynamic time warping technique instead of cross-correlations possibly improves the performance of similar waveform searching, including template matching analysis and unsupervised waveform clustering.
As an efficient similarity search technique, approximate nearest neighbor searches have been used in the field of image and speech recognition and have several applications in seismology (Yoon et al. 2015;Tibi et al. 2017).For example, Yoon et al. (2015) applied the Locally Sensitive Hashing (LSH) technique based on random substitution to seismic waveforms and developed a method called FAST that can efficiently search for similar waveforms by hashing the binary fingerprint generated from the waveform spectrogram.Using this method, Yoon et al. (2015) performed a brute-force calculation for a single channel, six months of waveform records.Scotto di Uccio et al. ( 2022) reported that a similar waveform search using FAST potentially complements events not detected using DL-based methods, although they achieved the best performance using a templatematching technique based on a template catalog developed using a DL-based method.The FAST algorithm has been used for induced earthquakes due to hydraulic fracturing and wastewater injection (Yoon et al. 2017), induced earthquakes in gas fields (Scala et al. 2022), swarms (Festa et al. 2021), foreshocks (Yoon et al. 2019b), and volcano seismicity (Garza-Girón et al. 2023).
The LSH-based method is far more scalable than waveform correlation methods.Additionally, the FAST performance has been enhanced by avoiding false positives using multiple channel/station results (Bergen and Beroza 2018), improving fingerprint (Bergen et al. 2016;Bergen and Beroza 2019) and hash table treatment through careful tunings, such as partitioning (Rong et al. 2018).Consequently, Yoon et al. (2019a) successfully applied this method for 11 stations, 27 channels, and 6-11 years of continuously recorded seismic data.However, the computational cost to process large-scale data remains high, limiting the size of the analyzable dataset.The DL-based method, which is beginning to be used in the field of image and audio recognition, is a potential solution to reduce the computational cost.Naoi and Hirano (2024) developed a method to convert a seismic waveform into a 64-bit binary code and applied it to 16-channel, 30-min 10 MS/s continuous AE records (the number of data points in each channel corresponds to 5.8-year continuous waveform records of 100 Hz sampling typically adopted in seismic observation) obtained from laboratory experiments.They showed that the hashing-based template matching and autocorrelation problems of these datasets can be solved in a realistic computation time.
In summary, besides improving processing pipelines in earthquake cataloging, ML enables efficient similar waveform searches that provide additional information, making it feasible to apply these methods to large-scale data, for which the computational cost has been an obstacle.

Focal mechanism analysis
Focal mechanism analysis requires polarities of P-wave first motions, and moment tensor analysis requires amplitude information or waveform inversion.The necessity of manual processing and the insufficient accuracy of techniques that automatically obtain these data have impeded the analysis of many small events.
The task of polarity reading for the P-wave first motions can most simply be treated as a binary classification problem (Ross et al. 2018a) that can be readily solved with DL.Similar to the arrival time reading problem, this approach exhibits high performance and markedly increases available focal mechanism solutions.For example, Uchide (2020), Cheng et al. (2023), andTanaka et al. (2021) used DL to read first motion polarities of seismic waveforms or AE data in Japan, southern California, and laboratory hydraulic fracturing experiments, respectively, obtaining tens to hundreds of thousands of focal mechanisms.Naoi et al. (2022) automatically read the timing of the initial maximum amplitude using DL; the obtained amplitudes were used to solve moment tensors for laboratory AEs.In these analyses, waveforms from a single station were typically used as input data to a DL network; additionally, multi-station approaches have been explored (Tian et al. 2020).Although it is not as active as the problem of arrival time reading, the generalizability of these models has also been studied (Hara et al. 2019).Furthermore, unsupervised clustering using DL can identify waveforms with different initial polarities without training data (Mousavi et al. 2019b).The increased number of available focal mechanisms improves spatiotemporal resolution for mechanism-based analyses, such as stress tensor inversion (Uchide et al. 2022;Qin et al. 2022b).ML is also helpful in interpreting such a large number of solutions.Kubo et al. (2023) applied nonlinear graph-based dimensionality reduction to a largescale dataset of moment tensor solutions, objectively obtaining a comprehensive image of the spatiotemporal characteristics of seismicity.
P-wave first motion polarities should show the same pattern for events with similar hypocenters and focal mechanisms.Using this property, Skoumal et al. (2023b) improved the number and accuracy of focal mechanism solutions employing ML to complement P-wave first motion data at stations not used in the mechanism analysis due to the insufficient signal-to-noise ratio.The polarity of such stations can also be complemented using the waveform correlation technique (Shelly et al. 2016).By combining these methods, Skoumal et al. (2023a) successfully estimated many focal mechanisms for events in the southeastern San Francisco Bay Area.
Regarding ML applications in focal mechanism analysis, in addition to improving the polarity reading and amplitude estimation, some studies have attempted to rapidly estimate the focal mechanisms for EEW (Kuang et al. 2021;Steinberg et al. 2021;Monterrubio-Velasco et al. 2022).Physics-Guided Neural Networks, which use ML for inverse analysis of the constrained solution space of the focal mechanisms, have been investigated (Zhang et al. 2021a).Future applications are expected to include other physics-constrained neural networks, such as PINNs and Physics-Encoded Neural Networks (Faroughi et al. 2022).

Paleoseismic record analysis
ML-based techniques for processing digitized waveform data have been rapidly developed.However, such digital records are available only for the last quarter of the period since the beginning of seismometer observations (Wang et al. 2019b).Past seismic activity should be investigated using analog records to assess the history of long-period seismic cycles.Although there are few examples, some important attempts have been reported.Furumura et al. (2023) applied a CNN network to digitize analog, strongmotion seismographs recorded on smoked paper.Wang et al. (2019bWang et al. ( , 2022d) performed event detection, arrival time reading, magnitude estimation, and hypocenter determination on continuous Develocorder film images obtained from the Rangely seismic control experiment by the United States Geological Survey, generating a seismic event catalog.Kaneko et al. (2021Kaneko et al. ( , 2023) also identified past low-frequency earthquake records among analog records using DL.

Current achievements and future prospects
As described in this section, ML is utilized in nearly every step of the earthquake catalog development pipeline.In particular, conventional automatic processing methods have not reached the accuracy of manual processing for regarding event detection, arrival time reading, and P-wave first-motion polarity reading.This has impeded the efficient processing of large amounts of data.DL models have demonstrated outstanding performance and have greatly improved the accuracy of automatic processing.Although the limited training data for large earthquakes and events in the regions or with focal mechanism types that have not previously occurred are likely to pose inherent challenges in these tasks, the outstanding performance of the DL approach has led to progress in related research.Some trained models are publicly available and have been applied to a wide array of data, either as-is or after optimization through transfer learning or retraining.The required computational cost is sufficiently small in the prediction process, facilitating the easy use of these models and greatly advancing seismicity analysis.Their performance will likely continue to improve via architecture modification and model development considering consistency across multiple station records.
Regarding the phase association and location, the adoption of ML technology is in its initial stages.The benefits and need for ML in these areas may remain unclear and warrant further research.Although applications to quality control problems also remain in the early stages of exploration, the ML technique is likely to be highly suitable for this task.DL has also been utilized for similar content search problems in other research fields and is expected to contribute to efficient similar waveform searching in seismology.The application to paleoseismic records is also in its infancy; however, ML technology, which has achieved great success in the image recognition problem, is likely to be well suited to address the related tasks efficiently.Effectively using ML in this domain could boost efforts to preserve paper records.
As described above, ML improves (or is expected to improve) the performance of individual tasks of the catalog development pipeline.However, in catalog development, it is not only necessary to enhance individual processes but also to optimize the overall performance.The catalog development pipeline, which includes similar waveform search processes, offers numerous options, making its optimization complex.Optimizations using an end-to-end approach, as Zhu et al. (2022c) suggested, are a potential approach to address this issue.
The ML model demonstrates generalizability and provides good results when analyzing seismic records from regions that differ from the training data.However, for widely used public datasets, sharing retrained models with more relevant data is beneficial.Preparing/sharing such models as a research infrastructure will enhance the related study.Datasets for retraining should ideally be publicly accessible and easily applicable for future model development.For example, the Seisbench toolbox (Woollam et al. 2022) provides a dedicated software environment for this purpose.To encourage such efforts, a clear policy on the secondary distribution of public data is essential.

Application to earthquake risk evaluation and seismicity analysis
In addition to contributing to earthquake catalog development, ML has been applied to seismicity analysis, including risk evaluation.Here, we summarize the related topics, including the applications in laboratory and induced earthquakes.Notably, we have not provided a comprehensive review of studies related to earthquake forecasting using ML (e.g., Ridzwan and Yusoff 2023).Mignan and Broccardo (2020) provided an excellent review of the current achievements in earthquake prediction using artificial neural networks.

Rupture forecasting of laboratory earthquakes
Frictional experiments in the laboratory are relatively successful targets in predicting the occurrence of macroscopic failures.In fracture or slip experiments in a laboratory, significant precursors are often observed, such as increased AE activity, decreased b-values, and changes in fractal dimensions (e.g., Lei and Satoh 2007;McLaskey and Kilgore 2013).
In slip experiments, it is possible to generate many seismic cycles, providing rich training data for ML to predict the timing and magnitude of macroscopic failures.This topic was triggered by Rouet-Leduc et al. (2017), who analyzed continuous AE records obtained during double-direct shear experiments.Thereafter, the passive and active AE records obtained in this type of experiment have been actively studied.In a pioneering study, Rouet-Leduc et al. (2017) developed a random forest model that could predict the timing of macroscopic slip from AE waveform data continuously recorded during the experiment.They showed that the time to macroscopic slip could be predicted well not only from the data immediately before the macroscopic failure, but also throughout the entire period of seismic cycles, including their early stage.Their prediction was based only on window data at a particular time without data from the entire cycle.This suggests that the AE waveforms contain information on the preparation process for future macroscopic failure throughout the seismic cycle.Rouet-Leduc et al. (2018b) applied a similar method to the field data and successfully reproduced surface displacements from continuous seismic data in the Cascadia subduction zone, indicating that tremors in this region occurred nearly at all times.
The issue raised by Rouet-Leduc et al. ( 2017) was treated in a Kaggle competition to improve prediction algorithms (Johnson et al. 2021), resulting in several published methods (Brykov et al. 2020;Laurenti et al. 2022).Related to this problem, Bolton et al. (2019) examined whether the unsupervised clustering technique could identify precursors for similar experimental data, while Jasperson et al. (2021) further improved the prediction accuracy by combining unsupervised clustering and attention networks.Rouet-Leduc et al. (2018a) solved a similar issue using a gradient-boosted tree and showed that the shear stress on the fault plane (i.e., the frictional state) could be reproduced from continuous AE recordings.They found that the variance in the waveform amplitude, a function of the average energy per unit of time, is the parameter that primarily contributes to the prediction.Hulbert et al. (2019) also found that the event duration, that is, the slip mode (slow or fast event), could be predicted in the experimental system where slow events possibly occur.Bolton et al. (2020) showed that fault slip velocity and frictional contact area per unit fault volume govern the variance of AE waveform amplitudes.Lubbers et al. (2018) reported that similar predictions are possible using AE catalogs comprising magnitude and time without using continuously recorded waveform data.Their prediction performance improved when using a more complete catalog (i.e., a catalog with a low completeness magnitude), possibly corresponding to the fact that the completeness of the earthquake catalog is important in detecting foreshock activities preceding large earthquakes (Mignan 2014;Trugman and Ross 2019).Shokouhi et al. (2021) and Shreedharan et al. (2021) confirmed that temporal changes in shear stress and time to macroscopic failure could be predicted using an active source, that is, transmission test data.This is not surprising given that many previous studies have indicated that fault transmission waves can be used to monitor fault conditions (Kendall and Tabor 1971;Nagata et al. 2008).To address a similar prediction problem based on active source data, Borate et al. (2023) successfully improved model performance using a PINN model incorporating physical constraints related to transmission waves, particularly when the training data size was small.
Inspired by the double direct shear experiment, ML predictions have also been attempted in an analog experiment simulating a subduction zone (Rosenau et al. 2017).In these studies, issues of predicting the time to macroscopic slip (Corbi et al. 2019), determining whether slip will occur immediately after the analyzed data window (Corbi et al. 2020), and predicting future surface velocity fields (Mastella et al. 2022), were addressed based on surface deformation data, by assuming the future application to the actual GNSS data.
As described above, ML models trained on data during many cycles of laboratory experiments have successfully predicted macroscopic ruptures and estimated the conditions of fault planes and medium.The application to double direct shear experiments data, pioneered by Rouet-Leduc et al. (2017), has driven this field.Future efforts to apply a similar approach to various laboratory experiments might identify important signals previously overlooked.
In laboratory data analysis, ML techniques reveal the physics of fault deformation in various ways besides the prediction issue.For example, Trugman et al. (2020) used ML to create an AE event catalog in a double direct shear experiment to image the spatiotemporal evolution of microslips in the synthetic fault gouge during the stick-slip cycle.Ma et al. (2022) simulated the stick-slip behavior of a sheared granular system using a discrete element method.They investigated the relationship between macroscopic stress changes and microslips between particles using the gradient boosting method (XGBoost) and performed feature importance analysis, revealing that the local spatial autocorrelation of microslips is the most important parameter.Chaipornkaew et al. (2022) established an ML-based method to evaluate kinematic efficiency (ratio of fault slip to regional deformation), an indicator of offfault deformation.They conducted an analog experiment in which the temporal evolution of a strike-slip fault could be tracked and trained a CNN model based on the obtained fault maps.By applying the resultant model to actual fault maps, they obtained kinematic efficiency estimates that overlap with the geologic estimation, indicating that models trained on the experimental data can quantify natural off-fault deformations.

Risk evaluation of induced earthquakes
ML is actively used to analyze human-induced seismicity caused by resource extraction, water injection, hydraulic fracturing, and geothermal development.In these cases, many microearthquakes that occur within a short period should be analyzed.ML-based techniques help to develop the microearthquake catalogs (Park et al. 2020;Wang et al. 2020a;Wong et al. 2021;Zhou et al. 2021;Glasgow et al. 2021;Zhang et al. 2022a;Kemna et al. 2022), and focal mechanism catalogs (Qin et al. 2022b) to investigate induced seismicity.
Another application of ML in studying human-induced earthquakes is its use in searching parameters that strongly affect the resultant seismicity.For example, Pawley et al. (2018), Wozniakowska and Eaton (2020), and Hicks et al. ( 2021) investigated the geological parameters, operational parameters, and tectonic properties that govern the activation potential of seismicity induced by hydraulic fracturing and water injection, using supervised learning based on logistic regression.Wang et al. (2022a) and Kemna et al. ( 2022) used a tree-based ML approach to identify critical factors (including operational parameters of hydraulic fracturing) controlling induced earthquakes.Larson et al. (2021) used classification/ regression trees and neural networks for a similar problem.Szafranski and Duan (2022) trained ML models of random forest, bagging, and k-neighbors regression algorithms using numerical simulation results and used the resultant model for Bayesian inversion analysis to estimate subsurface conditions.
In addition to the important feature analysis, attempts have been made to construct ML-based models for estimating seismic potential or predicting event rates.For example, Qin et al. (2022a)
ML is also used to separate background seismicity from that triggered by other earthquakes, traditionally performed using the Epidemic-Type Aftershock Sequence (ETAS) model (Ogata 1988) or the nearest neighbor analysis (Baiesi and Paczuski 2004;Zaliapin et al. 2008).Aden-Antoniów et al. ( 2022) used an ML-based model trained on the synthetic seismicity generated using the ETAS model for this task.They employed the random forest model to classify activities overlapping in the spatiotemporal distance space.Trampert et al. (2022) assumed the interevent-time distribution of induced earthquakes around gas reservoirs as a mixture of Omori, gamma, and exponential distributions.By applying gradient boosting regression, they concluded that the proportion of the Omori distribution (i.e., the fraction triggered by other earthquakes) was small.Muir and Ross (2023) used a deep Gaussian process instead of the simple background seismic activity rate function in the ETAS model to improve the seismicity model.
In seismicity analyses, ML can be used as a tool to reveal nonlinear relationships that cannot be simply modeled by existing theories.For example, DeVries et al. ( 2018) and Karimzadeh et al. (2019) used ML to predict the spatial pattern of aftershocks, although Mignan and Broccardo (2019) pointed out that the model proposed by DeVries et al. ( 2018) is unlikely to achieve better performance than the conventional approaches.Dascher-Cousineau et al. ( 2023) developed a DL-based seismicity model based on neural temporal point processes, called the Recurrent Earthquake foreCAST (RECAST) model, as an alternative to ETAS and demonstrated its utility and advantages.Arvanitakis and Avlonitis (2016) and Arvanitakis et al. (2018Arvanitakis et al. ( , 2019) ) employed supervised learning to identify asperities on plate boundaries from seismicity statistics, such as their density, recurrence intervals, and b-values.

Application in ground-motion prediction
The prediction of ground motions due to earthquakes is a major research topic in seismology and earthquake engineering; its objective is to predict the intensity or time series of ground motions caused by a given earthquake at a target location.Ground-motion prediction techniques are used for seismic hazard assessment, early warning of strong motions, and damage estimation after an earthquake.Recently, ML has been applied to this prediction and for creating supporting information.Hereafter, we review ML applications in (1) predicting ground-motion intensity from "features", which are individual measurable properties, such as distance from an earthquake and earthquake magnitude; (2) predicting time series of ground motions from features; (3) predicting ground-motion intensity from time series input; (4) predicting ground-motion time series from time series input; (5) other related studies.The relationship among 1-4 is visualized in the matrix component of Fig. 1.

Prediction of ground-motion intensity from features
Empirical equations called ground motion prediction equations (GMPEs) have been used to predict the intensity of ground motions (i.e., seismic intensity and peak ground acceleration) at a target location caused by a given earthquake.GMPEs have been constructed by regressing past ground-motion records, considering the geophysical model between an earthquake and ground motions.Since this is a supervised ML problem, ML regression techniques have been applied.The applications were underway before the third Artificial Intelligence (AI) boom began in the late 2010s (Kerh and Ting 2005;Ahmad et al. 2008).Recently, many studies have been conducted using various ML methods: neural networks (Derras et al. 2012(Derras et al. , 2014;;Dhanya and Raghukanth 2018;Wang et al. 2020b;Khosravikia and Clayton 2021;Okazaki et al. 2021c, b;Ji et al. 2021), tree-based models such as random forest and XGBoost (Trugman and Shearer 2018;Kubo et al. 2020;Khosravikia and Clayton 2021;Oana et al. 2022), support vector regression (Tezcan and Cheng 2012;Khosravikia and Clayton 2021;Hu et al. 2022), Gaussian process regression (Hermkes et al. 2014;Alimoradi and Beck 2015;Lavrentiadis et al. 2023), and Bayesian network (Kuehn et al. 2011).The prediction targets of ground-motion intensity include peak ground acceleration (PGA), peak ground velocity, response spectra, and seismic intensity.While early ML models predicted a single indicator, more recent multi-output models can predict several indicators simultaneously (Dhanya and Raghukanth 2018;Oana et al. 2022;Hu et al. 2022).
An advantage of ML models over conventional GMPEs is that nonparametric ML methods can learn the functions of ground-motion models directly from data without assuming regression equations, resulting in a more accurate and useful predictor within the range of training data.Some studies have reported that ML prediction models have higher accuracy than the existing GMPEs (Khosravikia and Clayton 2021;Oana et al. 2022;Seo et al. 2022).Khosravikia and Clayton (2021) reported that when sufficient data are available, the ML prediction models provide more accurate estimates than the conventional linear regression-based method.
Another advantage of the ML model is that new data or features (explanatory variables) not used in conventional GMPEs can be incorporated.Conventional GMPEs used earthquake magnitude, distance from an earthquake (e.g., hypocentral distance, shortest distance from the fault, and Joyner-Boore distance), depth of an earthquake, and site information below the predicted point, such as Vs30 (average S-wave velocity up to a 30 m depth).The ML prediction model designed by Oana et al. (2022) adopted additional features, such as the earthquake's location, the prediction point's location, and back azimuth information.Their ML model with the additional features had a lower logarithmic standard deviation value than the previous GMPEs.Esteghamati et al. (2022) designed an ML predictor with information on horizontal-tovertical spectral ratios (HVSR) as explanatory variables.Although a simple proxy, such as Vs30, has often been used in previous studies as site information, the use of more detailed information, such as HVSR, is expected to improve the accuracy of ground-motion prediction.Moreover, a ground-motion prediction model that directly accounts for the site amplification at the target station can be made by learning ground-motion data labeled for each observation station.Okazaki et al. (2021c) proposed an ML model that uses a one-hot representation of site ID, which achieved accurate prediction considering site-specific ground-motion amplification and avoided overfitting at sites with few records.
Furthermore, ML leads to a more flexible prediction.Although previous studies made predictions at individual points, Lilienkamp et al. (2022) achieved "areal" prediction by treating the explanatory and predictive variables as map-shaped data.Rekoske et al. (2023) presented a reduced-order modeling approach based on interpolated proper orthogonal decomposition to predict the spatial distribution of peak ground velocity.Sreenath and Raghukanth (2023) constructed a ground-motion model based on the Bayesian Neural Network (BNN) (Jospin et al. 2022), in which the Bayesian framework is incorporated into neural networks.Their approach could enable the evaluation of uncertainties in the prediction of earthquake motion because BNN facilitates probabilistic prediction.
To mitigate the negative effect of the black-box nature of AI models, eXplainable AI (XAI) techniques have recently been developed to help humans understand the reasons behind decisions or predictions made using AI models.The applications of the XAI techniques are useful in understanding the behavior of an ML-based groundmotion prediction model.Mohammadi et al. (2023) applied Shapley additive explanation (SHAP), which was developed based on the game theory to explain individual predictions of ML models (Lundberg and Lee 2017), to explore the importance of each explainable variable in the XGBoost prediction model.

Prediction of ground-motion time series from features
Numerical simulations have often been used to obtain seismic waveforms at a target location due to a given earthquake.With the development of observation networks, the accumulation of ground-motion records, advancements in the subsurface structure, refinement of numerical simulation techniques for ground motions, and increased machine power, numerical simulations of three-dimensional seismic wavefield have been conducted by assuming a subsurface velocity structure model that reflects the real three-dimensional heterogeneity (Moczo et al. 2014;Igel 2017).
Recently, researchers have explored a datadriven approach using generative ML models for obtaining seismic waveforms based on past groundmotion records.Generative ML models are a class of statistical models that attempt to capture the underlying probability distribution of data and can be trained to generate data instances (Bengio et al. 2013;Jebara 2012).One of the major generative models is GAN (Goodfellow et al. 2014), in which two networks are trained sequentially: a generator network to produce pseudo data and a discriminator network to distinguish between the truth or falsity of data.The well-trained generator can be used as a generative ML model.Mirza and Osindero (2014) proposed a conditional GAN (CGAN) that controls the generation process by considering additional features.Combining CGAN and Wasserstein GAN (Arjovsky et al. 2017) with the training of a massive set of strong motion recordings in the time domain, Florez et al. (2022) developed a generative model that synthesizes realistic three-component accelerograms conditioned on magnitude, distance, and Vs30.Their generative model captures the most relevant statistical features of the acceleration spectra and waveform envelopes.The output seismograms display clear P-and S-wave arrivals with the appropriate energy content and relative onset timing.Moreover, Esfahani et al. (2023) developed a generative model for simulating ground-motion recordings, which combines a CGAN to predict the amplitude part of the time-frequency representation of ground-motion records and a phaseretrieval method.Matsumoto et al. (2023) constructed a ground motion generation model using GANs and evaluated its performance.They demonstrated through numerical experiments that their proposed model is probabilistic, approximates the probabilistic distribution of the dataset of observed records, and generates realistic ground-motion time histories with various characteristics in the time and frequency domains.As a new-generation model, diffusion models (Ho et al. 2020;Rombach et al. 2021) have recently attracted attention and will be used for groundmotion prediction.The data-generation process with a generative ML model involves randomness, and the generated data instances differ slightly even if they are generated under a common condition.How to use the generated instances with randomness remains unclear.Moreover, strategies to evaluate the validity of the generated instances are needed.
Another possible ML approach for obtaining groundmotion time series is PINNs, which can consider physical laws.Several studies have applied PINNs to the forward simulation of seismic waveforms, including in the simulation of acoustic wavefield described by a frequency-domain wave equation (Alkhalifah et al. 2020;Song et al. 2021a, b;Huang et al. 2021;Huang and Alkhalifah 2022;Song and Wang 2022a), the reconstruction inversion of acoustic wavefields (Song and Alkhalifah 2022), and the simulation of twodimensional (2D) SH wave propagation (Ding et al. 2023a, b).Furthermore, Rasht-Behesht et al. (2022) proposed an approach to solve wave propagation and full waveform inversion based on PINNs.Moreover, neural operators, which learn mapping between function spaces via supervised training to solve PDEs (Kovachki et al. 2021), have attracted attention in recent years and have been applied to the construction of a surrogate model for wave propagation simulation (Yang et al. 2021b(Yang et al. , 2023;;Song and Wang 2022b;Zhang et al. 2023;Zou et al. 2023;Lehmann et al. 2024).Although PINNs and neural operators remain in methodological development and have limited applications, further development in ground-motion prediction is expected.

Prediction of ground-motion intensity from time series
Onsite prediction of the final ground-motion intensity at a target site using the first few seconds of its observations is a major topic of EEW (Nakamura 1988;Wu and Kanamori 2005;Spallarossa et al. 2018).Recently, this has been approached using ML techniques, such as random forest (Hu et al. 2023), CNN (Jozinović et al. 2020;Zhang et al. 2022b), support vector machine (Song et al. 2022), Graph Neural Network (Bloemheuvel et al. 2022), and LSTM (Wang et al. 2022b(Wang et al. , 2023a)).Additionally, ML has been applied to discriminate in real-time whether a signal stems from an actual earthquake based on the initial part of ground motions (Li et al. 2018;Meier et al. 2019;Liu et al. 2022).Although these studies adopted a single-station approach where inputs and outputs were closed at one station, Münchmeyer et al. (2020) proposed a multiple-station approach that predicts final PGA values at multiple target stations using initial raw records at multiple input stations.

Prediction of ground-motion time series from time series
Recently, predicting future or present ground motions directly from currently observed ground motions, skipping source estimation, has been actively studied in the research field of EEW (Kodera et al. 2018;Hoshiba 2021), and ML has been applied.Otake et al. (2020) proposed a DL model to predict the present seismic intensity at the target station based only on the observation records at the surrounding stations, demonstrating that its prediction performance is better than adopting the maximum or weighted average of the input data.Additionally, Fornasari et al. (2022) developed an ensemble ML model combining CNN and Voronoi tessellation, which could reconstruct the ground-shaking field in real time based on the pointcloud spatial distribution of current ground-shaking data recorded at each observation station.Datta et al. (2022) designed a deep spatiotemporal RNN to predict future intensity time series at multiple stations directly from their current ground-motion observations.Tamhidi et al. (2022) proposed an approach where ground-motion time series at target sites are constructed from a set of observed motions using a Gaussian process regression, which treats the real and imaginary parts of the Fourier spectrum as random Gaussian variables.Furumura and Oishi (2023) developed a DL approach for early prediction of long-period ground motions at a target point far from the earthquake source based on waveform observations near the source.This approach could effectively forecast long-period ground motions of large earthquakes regarding amplitude, waveform envelope shape, spectral characteristics, and duration.
Due to computational constraints and insufficient knowledge of the underground structure at short wavelengths, physics-based simulation (PBS) ground motions are reliable only in the long-period range (typically, approximately 1 s).ML has been applied to convert long-period to short-period PBS waveforms and obtain broadband ground-motion waveforms.In the approach proposed by Paolucci et al. (2018), the short-period response spectra were estimated using an ML model from the long-period ones simulated by PBS.Then, the short-period components of the original PBS waveforms were enriched using a stochastic simulation to match the estimated response spectra.Sharma et al. (2022) upgraded the estimation of the short-period response spectra proposed by Paolucci et al. (2018) in a way that not only response spectra of long-period PBS waveforms but also additional features, such as source, path, and site parameters, were used as explainable variables.Furthermore, Okazaki et al. (2021a) explored an approach that generates consistent broadband waveforms using past observation records.Long-period Fourier amplitude spectra and acceleration envelopes were transformed to short-period ones using a neural network and an embedded space obtained by the nonlinear dimensionality reduction of t-SNE (Van der Maaten and Hinton 2008), respectively; they were combined to produce a broadband waveform.The notable features of their study include (1) using the Wasserstein distance in the dimensionality reduction of envelopes to capture their global properties and (2) simultaneously embedding pairs of long-and short-period envelopes into a common space to obtain the relationship from a limited amount of ground-motion data.
The denoising of seismic waveform records is related to the issue discussed in this section.Several studies have been conducted to extract the target signal from originals containing waves from various sources or separate signals and noise (Zhu et al. 2019b;Tibi et al. 2021Tibi et al. , 2022;;Dalai et al. 2021;Novoselov et al. 2022;Yin et al. 2022a;Xu et al. 2022;Wang and Zhang 2023).These efforts will lead to the effective use of observation data unanalyzed due to a low signal-to-noise ratio.

Studies related to ground-motion prediction
ML has been applied to estimate site amplification characteristics, of which frequency dependency varies among sites.Roten and Olsen (2021) developed a DL model to estimate the relative site amplification characteristics between borehole and surface seismograms from discretized S-wave velocity profiles.Pan et al. (2022) used a DL model to estimate S-wave site amplification characteristics from the information on microtremor HVSR.Zhu et al. (2023b) proposed an ML model to obtain site amplification factors from site proxies and suggested that the ML model performs better than the conventional approach.Zhu et al. (2023a) trained a supervised DL model to recognize and separate seismic site response from single-station seismograms.Pilz et al. (2020) proposed a data-driven identification method based on information from various sites to identify a reference site.
Furthermore, ML techniques have been applied to seismic hazard assessment.Numerical simulations of ground motions based on many earthquake scenarios should be conducted and sampled to quantify variations in seismic hazard assessment.Imai et al. (2021) proposed a method to generate many scenario earthquake shaking maps from existing shaking maps using modal decomposition of proper orthogonal decomposition and an empirical copula.
The quality assessment and selection of groundmotion records are necessary to improve the groundmotion prediction model.Dupuis et al. (2023) applied a supervised DL-based model to this issue.Their model can estimate the quality and minimum usable frequency for each record component.It can handle one-, two-, or three-component records, providing flexibility to assess record quality based on various requirements.
Tsunamis are another phenomenon caused by earthquakes besides ground shaking.Several studies have applied ML to tsunami prediction (Fauzi and Mizutani 2020;Makinoshima et al. 2021;Liu et al. 2021;Kamiya et al. 2022;Rim et al. 2022).

Approach to imbalanced ground-motion data
One significant problem with applying ML in groundmotion prediction is the imbalance in observation data of ground motions.Observation data in the geophysical field are often imbalanced.In the case of ground-motion data, few records of strong shaking, near source faults, and large-magnitude earthquakes are available.Figure 3 shows the distribution of the ground-motion dataset used by Kubo et al. (2020): ground-motion records in the range of 1-20 cm/s/s account for much of the dataset, whereas there are few records of strong motions over 1,000 cm/s/s.Moreover, the spatial distribution of earthquakes and their focal mechanisms are strongly biased.Most observations of ground motions have been made on land.Learning such biased data may have undesirable effects on the ML prediction model.For example, Kubo et al. (2020) showed that ML training with unbalanced ground-motion data leads to biased prediction.
One approach to solving this issue is augmentation with simulated data.Withers et al. (2020) used a database of synthetic ground motions extracted from the Southern California CyberShake study to build an ML prediction model of ground-motion intensities in Southern California.This model had behavior and performance similar to empirically based models.Raghucharan et al. (2021) appended the simulated data from the validated seismological model to the observation data so that the trained dataset for their ML model covered a wide range of magnitude and distance, filling the data gap region.Lehmann et al. (2023) prepared a synthetic database based on many PBSs considering various source mechanisms or finite faults for constructing an ML model to simulate low-frequency ground-motion parameters for arbitrary focal mechanisms or finite fault sources.ML training with simulated data alone has been done in research fields where observation data are fundamentally scarce, such as in the cases of tsunamis (Makinoshima et al. 2021) and gravity waves (Licciardi et al. 2022).The emulation of ground motions using ML surrogate models, such as generative ML models, PINNs, and neural operators, is expected to be used for data augmentation.
Another approach to address this issue is incorporating geophysical findings into the ML model.Several studies have incorporated conventional GMPEs into the ML model.GMPEs are stable for extrapolation or low data density because conventional GMPEs are based on the geophysical background of ground motions.Khosravikia and Clayton (2021) reported that the approach of conventional GMPEs is better than ML-based models when training data are limited.Kubo et al. (2020)  proposed a hybrid approach of ML and conventional GMPE for predicting ground-motion intensity.In this approach, an ML model was trained with the residuals between observations and predictions using GMPE, and the sum of the predictions using GMPE and the ML model was used as the prediction of the hybrid model.They demonstrated that this hybrid approach reduces the underestimation of strong motions found in the prediction model using only ML and performs better than the individual approaches.Oana et al. (2022) included the predicted value of GMPE in the explainable variable of the ML prediction model.
Direct incorporation of geophysical knowledge into an ML model has been conducted.Previous studies on ground motions clarified the relationship between ground motions and related factors.Generally, ground motions increase with magnitude and decrease with distance, and they tend to decrease with Vs30 and increase with the thickness of sediment layers.To follow these relationships in an ML prediction model, Okazaki et al. (2021b) proposed a neural network model with a monotonic dependence on the input variables.They demonstrated that this approach reduces overfitting for training data and improves generalized performance.
Transfer learning (Pan and Yang 2010;Bozinovski 2020) can be used to address ground-motion prediction problems.Here, an existing ML model trained with ground-motion records in areas with large amounts of data is used as an initial model to develop an ML model in areas with few records.Jozinović et al. (2022) introduced transfer learning to develop an ML model of onsite EEW for a small dataset; their approach improved the prediction performance regarding outliers, bias, and variability of the residuals between predicted and actual values.

Current achievements and future prospects
As described in this section, many studies have applied ML to earthquake ground-motion prediction, resulting in significant progress, including reporting predictions that are more accurate than existing models and realizing predictions that were previously difficult to predict.However, ML-based ground motion prediction is not widely used in practical applications such as earthquake hazard assessment and earthquake early warning.The black-box nature of ML is likely one factor limiting its use.Further research is needed to evaluate the limits of applicability of ML predictions and their uncertainty, versatility, and reproducibility.
While the ground-motion prediction model will continue to become more sophisticated, it is also essential to update the ground-motion data.To achieve this, it is crucial to maintain and upgrade strong-motion observation networks and to steadily increase observation records.Furthermore, the usability, accessibility, and transparency of the observation records must be continuously improved to encourage the further use of ground-motion data.One such effort is the construction and publication of a "flat file", a data set that integrates information on ground-motion intensity measurements, seismic source information, and site information, such as the NGA-WEST2 database (Ancheta et al. 2014) and the New Zealand strong motion database (Van Houtte et al. 2017).As mentioned in the previous subsection, extending the dataset based on PBSs is also important.The simulations can create records for huge earthquakes or those near faults, for which there are few observation records.This will help compensate for the imbalance in the observation-based ground-motion dataset.Such efforts are useful in applying ML because the imbalance in training data directly leads to the reduction of the prediction performance of ML models.Data emulation with ML surrogate models, such as generative ML models, PINNs, and neural operators, will also be used.
Several studies have raised issues regarding the effect of ergodic assumption in ground-motion prediction models on their prediction results (Anderson and Brune 1999;Abrahamson et al. 2019;Lavrentiadis et al. 2023).The ergodic assumption is that the median and aleatory variability of a ground-motion prediction model are applicable to any location within the broad tectonic category.Although most ground-motion prediction models are based on this assumption, the ergodic ground-motion prediction models may not work well for a specific site/source location because of significant systematic differences in ground motion depending on the location of the site and source.Therefore, a growing need exists for developing non-ergodic ground-motion prediction models that explicitly model the locationspecific effects, which can lead to a reduction in aleatory variability.The application of ML will prove essential in the construction of site-specific or location-specific ground-motion prediction models because the situation is more conducive to the flexibility of ML, which can handle various explanatory variables.Bergen et al. (2019) indicated that ML use in solid earth geoscience can be broadly divided into three categories: automation, modeling (simulation and inversion), and discovery.For example, phase picking is an example of automation, while ground motion prediction falls under modeling.The current use of ML in seismology is primarily focused on the automation category.Applying ML to modeling or discovery may be more difficult; however, we look forward to further research in this area.

Application to geodetic data
This section summarizes the application of ML to geodetic data.We primarily focus on topics related to crustal deformation caused by tectonic motion.

Clustering analysis
GNSS observations record seismic and aseismic signals as well as the long-term secular motion of the Earth's surface.These long-term secular velocity fields contain information regarding rigid block motion, internal deformation, and strain accumulation processes due to plate motions, such as subduction, opening, and collision.Data-driven approaches based on clustering algorithms have been developed to identify regional differences in these velocity fields.Clustering approaches can objectively determine block boundaries without prior information, such as fault location, geological information, or underlying deformation processes.These algorithms can be broadly divided into hard and soft clustering.Hard clustering algorithms determine the cluster to which each observation belongs, and soft clustering allows each station to belong to multiple clusters.In other words, the soft clustering approach provides information on the uncertainty of each cluster at each station.
Two approaches of hard clustering algorithms have been applied to GNSS velocity fields.The first approach is hierarchical clustering, which initiates the clustering procedure with each cluster comprising a single data point.Then, two clusters are sequentially combined based on the similarity in their data.Consequently, all data can be categorized into a hierarchical tree or a dendrogram.When focusing on a small number of clusters (i.e., higher levels in the dendrogram), each boundary represents a large-scale difference in GNSS velocities, whereas many clusters with lowerlevel clustering correspond to small-scale boundaries.The advantage of this approach is that the clustering result is uniquely determined, and there is no need to define the number of clusters.Once two stations are connected in the same cluster at a lower level, these two stations will never be categorized into different clusters at a higher level.
The second approach is the partitioning-optimization approach, in which the number of clusters is first determined, followed by clustering using a measure of similarity between observations.The clustering results depend on the initial clusters, which must be determined beforehand.The robustness of these initial clusters can be examined by conducting clustering with different sets of initial clusters.However, achieving a global minimum solution is not guaranteed.Simpson et al. (2012) proposed a hierarchical agglomerative clustering approach using the similarity of velocity vectors as a metric for clustering.They applied it to the GNSS data in the San Francisco Bay Region, California, where the tectonic setting is simple and dominated by a strike-slip motion.Additionally, they proposed the k-means method as a partitioningoptimization clustering approach and compared it with hierarchical agglomerative clustering.Both methods yielded substantially different clustering results depending on the number of clusters; however, the geographical clustering maps were roughly similar.This approach assumes a flat earth and is, thus, suitable for velocity fields that can be approximated using the translational motion of blocks, such as in the San Francisco Bay Region.In other words, this approach is useful when the Euler pole is located at a great distance.Following this study, Savage and Simpson (2013a) generalized the k-means approach proposed by Simpson et al. (2012) to be applicable to a spherical earth.For this purpose, they proposed the Euler-Vector clustering algorithm to minimize the residuals between the observed velocity and calculated velocity obtained using the Euler-Vector in the cluster to which each station belongs.Starting from the initial grouping obtained using the k-means clustering, the Euler-Vector clustering was applied to the Mojave Block, southeastern California (Savage and Simpson 2013a) and the California-Nevada region (Savage and Simpson 2013b), yielding a clustering result similar to the k-means clustering.The velocity fields targeted by these previous studies are oriented in roughly similar directions with different amplitudes, indicating that the Euler poles are at distant locations.Savage and Wells (2015) applied a similar clustering approach across a broad area in the Pacific Northwest, USA, from 38°N to 49°N, including California, Nevada, Oregon, and Washington.The velocity field in this target area is approximated by a single Euler Pole, indicating that the velocity field is rotating clockwise and the Euler Pole is close to the target area.Although this velocity field incorporates information on strain accumulation due to plate subduction in Cascadia and internal deformation (or includes noise for clustering), they identified blocks that align with geologically determined boundaries.Subsequent studies applied this hard clustering method to the GNSS velocity fields in the southern San Andreas Fault (Thatcher et al. 2016), southwest Japan (Savage 2018), Taiwan (Takahashi et al. 2019), Turkey (Özdemir and Karslıoğlu 2019), and New Zealand (Takahashi and Hashimoto 2022), and discussed regional characteristics of block boundaries.
As illustrated above, GNSS velocity is a powerful tool for objectively identifying crustal structures.However, the observed velocities are often contaminated by various types of noise, such as atmospheric and ionospheric delays, the local motion of observation sites, and orbit errors.Takahashi et al. (2019) proposed a method of assessing stability based on information entropy to evaluate the robustness of the clustering results obtained using such a noisy dataset.Numerous synthetic datasets were prepared to calculate the information entropy by adding noise and conducting clustering for each dataset.Then, for each pair of stations, the information entropy was defined so that it is minimized when the two vectors are classified into the same or different clusters.That is, when these two data points are randomly classified, the information entropy reaches its maximum.By summing the information entropy for all pairs at each station, the entropy or robustness of the clustering result at each station was obtained (see Fig. 8 in Takahashi et al. (2019) and Fig. 6 in Takahashi and Hashimoto (2022)).
In most previous studies, clustering analysis was performed based on velocity fields; however, Yáñez-Cuadra et al. ( 2023) employed strain rate invariants and rotation rate fields, derived from velocity fields.Applying the hierarchical agglomerative clustering algorithm to the GNSS data in the Chilean subduction zone revealed a correlation between clustering results and the segmentation of the seismogenic zone and regional geological structures.
Unlike hard clustering, the soft clustering approach provides a probability of the cluster to which each station belongs.This approach directly identifies stations or regions with ambiguous crustal structure boundaries without additional evaluation of the clustering result.Özdemir and Karslıoğlu (2019) compared three hard clustering methods including k-means, hierarchical agglomerative clustering, and Gaussian mixture model with a soft clustering version of Gaussian mixture model fits.All the hard clustering algorithms successfully clustered GPS velocities in Turkey into the Eurasian and Arabian blocks, separated by the Anatolian strikeslip faults.However, additional clustering results within the Anatolian block varied depending on the chosen clustering algorithms.This ambiguity is clearly illustrated by the soft clustering method.Mitsui and Watanabe (2020) developed a different soft clustering approach using the fuzzy c-means method.They discussed the tectonics in the Izu Peninsula, a collisional zone with a complex deformation field in Japan.Granat et al. (2021) developed an open-source code for clustering GNSS velocities using methods available in the scikit-learn package in Python.The clustering results can be easily displayed via Google Earth or Google Maps using this code.

Detection of geodetic signals due to seismic/aseismic phenomena
ML techniques are powerful tools for extracting spatiotemporal signals of interest.A primary focus in recent geodetic studies is the detection of transient signals, such as slow slip events (SSEs).Accurately detecting such transient signals contributes to our understanding of slow earthquake activities while providing an accurate estimation of interseismic velocities that reflect the state of interplate coupling.Signals due to SSEs are often so weak that they can be buried in noise; therefore, methods that can effectively detect these small signals are required.
Several approaches have been proposed to detect SSEs: 1) extraction solely from geodetic data (Nishimura et al. 2013;Crowell et al. 2016;Yano and Kano 2022); 2) extraction using templates (Riel et al. 2014;Rousset et al. 2017;Okada et al. 2022); and 3) using other slow earthquake phenomena as a reference (Frank et al. 2015;Bartlow 2020).Additionally, ML-based approaches have been suggested.Most proposed methods adopted a supervised approach to detection or classification problems.He et al. (2020) combined RNNs and CNNs to detect SSEs, applying the Ocean Bottom Pressure (OBP) data obtained in New Zealand.The input for their model is a time series, and the output is a label indicating whether SSEs occur in the middle of the time series.They demonstrated that the proposed method performed better than the matched filter approach.Because seafloor geodetic data, such as OBP, are typically noisier than onland GNSS data, successful extraction of SSE signals will improve the understanding of slip behavior, especially in the shallower parts of subduction zones.
Rouet-Leduc et al. ( 2021) applied a deep autoencoder to a time series of Interferometric Synthetic Aperture Radar (InSAR) images to automatically detect millimeter-scale crustal deformation without fault location information.Their method was designed to mitigate noise in the InSAR time series, leveraging the characteristic that atmospheric errors do not correlate over a day and tectonic signals result in permanent deformation.They detected crustal deformation caused by a slow earthquake in the Anatolian Fault, Turkey, which was larger than previously considered.Xue and Freymueller (2023) proposed an RNN-based approach to obtain a time series of the probability of transient signals being included in the GNSS time series using a single station.Their model sequentially refers to each data only once; thereby, efficiently obtaining output probabilities compared with common methods using sliding windows.After training their model using synthetic data, they applied it to detect the SSEs in Cascadia between 2005 and 2016.Their detection results were consistent with the previously identified characteristics of SSEs such as timings, durations, and areas.Furthermore, they demonstrated that the proposed method is robust despite gaps in the time series.Based on GNSS data, Costantino et al. (2023) developed DL methods for estimating seismic and aseismic source characteristics.By, respectively, training spatial, temporal, and spatiotemporal synthetic datasets in DL models, the spatiotemporal data with the transformer displayed the best performance in estimating source parameters of earthquakes along the Japan Trench in northeast Japan.They highlighted the potential of geodetic data to improve a low detection limit using an ML approach.The spatial pattern of crustal deformation is useful for the real-time detection of large earthquake signals because of its usage of unsaturated displacement waveforms.Lin et al. (2021) proposed the GNSS-based EEW algorithm, M-LARGE, and demonstrated that the proposed method outperformed other GNSS-based non-DL methods.

Modeling and prediction of crustal deformation
AI techniques have been introduced to model crustal deformation.DeVries et al. (2017) proposed a viscoelastic modeling method using a neural network trained on theoretical modeling results.By introducing neural networks, the calculation can be drastically accelerated compared with the forward calculation.Okazaki et al. (2022) proposed a PINN approach to model crustal static deformation due to an earthquake.By using a polar coordinate system, the displacement discontinuity on a fault can be accurately modeled as a boundary condition.Fukushima et al. (2023) used a PINN approach for earthquake cycle simulation that calculates the temporal evolution of slip velocities of SSEs using a spring-slider system.They additionally proposed an alternative inversion approach based on PINNs for frictional parameter estimation.Due to its flexibility, PINNs can readily incorporate complex models such as heterogeneous underground structures and threedimensional plate geometries.Therefore, PINNs will be a powerful tool for geodetic inversion.Another application that integrates geodetic data and ML techniques is the prediction of time series.Fukushima et al. (2023) proposed a PINN-based scheme for predicting the shortterm evolution of slip velocities along the subducting plate interface.
ML has been adopted to predict postseismic deformation.Time series of postseismic deformation are often modeled as a superposition of logarithmic and exponential functions (Tobita 2016).Yamaga and Mitsui (2019) applied an RNN to this problem and predicted future time series following the 2011 Tohoku-oki earthquake.
Their RNN inputs a 1-year GNSS time series and forecasts displacement on the day following the input time series.After training the RNN model using GNSS stations in northeast Japan, they conducted predictions of GNSS time series that were not used in training.Compared with the traditional method of fitting logarithmic and exponential functions, the RNN model had better predictive abilities with smaller residuals between the predicted and observed displacements.They showed that the residual time series of the RNN outputs represented the dominant mechanisms depending on the elapsed time from the mainshock, that is, the afterslip decay and, consequently, the domination of viscoelasticity.

Current achievements and future prospects
As described in this section, ML techniques have been introduced to geodetic data analysis such as clustering, detection of events, and modeling.Nonetheless, fewer application examples have been reported compared with seismic data analysis as reviewed in this paper.Clustering analysis is one of the well-studied topics.The previous studies have demonstrated that the proposed algorithms are widely applicable to crustal deformation in various tectonic regions.The resulting clusters provide boundaries of crustal structures with their uncertainties from regional to global scales, contributing to our understanding of the tectonics in the target areas.In contrast, in recent years, the detection of tectonic signals, modeling, and prediction of crustal deformation are emerging topics of ML application to geodetic data.As new ML-based approaches are continuously proposed, they would detect a greater number of hidden events, discover new aseismic phenomena, or be utilized as a tool that can easily incorporate complex geometries or spatial heterogeneity of underground structures for geodetic modeling.These achievements will eventually contribute to further understanding of the physical processes associated with the whole earthquake cycle.

Conclusions
In this paper, we reviewed ML applications in several fields of earthquake seismology, especially in the development of earthquake catalog, seismicity analysis, ground motion prediction, and application to geodetic data.ML technologies have significantly advanced these fields; however, unique challenges persist.For example, the imbalance in natural datasets is problematic in many cases, possibly causing misevaluation or misinterpretation.Effective approaches to address this problem include data augmentation, simultaneous use of domain knowledge, and transfer learning.Although challenges arise from the black-box nature of DL, the latest techniques, such as PINNs, neural operators, BNN, and XAI, can address them.The efficiency, accuracy, and flexibility of ML are the driving forces behind the establishment of its usage for various tasks in earthquake seismology.There remain many problems where ML can effectively solve, and its application will further expand and advance our knowledge of earthquake seismology.

Fig. 1
Fig. 1 Topics reviewed in this study

Fig. 3
Fig. 3 Distribution of ground-motion dataset used by Kubo et al. (2020) with the heat map showing the relationship between epicentral distance and PGA with their histograms (Fig. 5 of Kubo et al. (2020)).They used 186,310 ground-motion records in Japan, which were recorded from 1997 to 2015 by K-NET and KiK-net of National Research Institute for Earth Science and Disaster Resilience.The heat-map color denotes the number of ground-motion records