 Full paper
 Open access
 Published:
Improving the accuracy of polar motion prediction using a hybrid least squares and long shortterm memory model
Earth, Planets and Space volume 75, Article number: 153 (2023)
Abstract
Earth rotation parameters (ERPs) are essential for transforming between the celestial and terrestrial reference frames, and for highprecision space navigation and positioning. Among the ERPs, polar motion (PM) is a critical parameter for analyzing and understanding the dynamic interaction between the solid Earth, atmosphere, ocean, and other geophysical fluids. Traditional methods for predicting the change in ERPs rely heavily on linear models, such as the least squares (LS) and the autoregressive (AR) model (LS + AR). However, variations in ERP partly reflect nonlinear effects in the Earth system, such that the predictive accuracy of linear models is not always optimal. In this paper, long shortterm memory (LSTM), a nonlinear neural network, is employed to improve the prediction of ERPs. Polar motion prediction experiments in this study are conducted using the LSTM model and a hybrid method LS + LSTM model based on the IERS EOP14C04 time series. Compared with Bulletin A, the PMX and PMY prediction accuracy can reach a maximum of 33.7% and 31.9%, respectively, with the LS + LSTM model. The experimental results show that the proposed hybrid model displays a better performance in mid and longterm (120–365 days) prediction of polar motion.
Graphical Abstract
Introduction
Due to the rapid development of space technology, the accuracy of Earth Orientation Parameter (EOP) estimates dramatically improved in the 1990s (Schuh et al. 2002) and has remained at a high level ever since. The EOPs consist of precession–nutation, polar motion (PM), the difference between Universal Time (UT1) and Coordinated Universal Time (UTC), that is, UT1UTC, and Length of Day (LOD) (Petit and Luzum 2010). Earth Rotation Parameters (ERPs) comprise the PM (including PMX and PMY), UT1UTC, and LOD. The instantaneous movement of the Earth's rotation axis with respect to the terrestrial reference frame is described by the PM. As for PM, a major complication is that it is caused by partially unpredictable mass redistributions on the surface and in the interior of the Earth (Gross 2007; Dobslaw et al. 2010; Sun et al. 2019; Börger et al. 2023). Modern space navigation and deep space exploration are increasingly required for accurate realtime prediction of ERPs. Given the complicated data processing of modern geodesic techniques, such as Global Positioning System (GPS) technology, the acquisition of ERP results must be delayed by 15–20 h. Obtaining ERPs requires several days for the Very Long Baseline Interferometry (VLBI) and Satellite Laser Ranging (SLR) technologies. These factors make it challenging to acquire realtime ERPs, emphasizing the need for accurate predictions (Zhang et al. 2012). Several national and international services publish predicted values of EOPs, such as the International Earth Rotation and Reference Systems Service (IERS) Rapid Service/Prediction Center (RS/PC), operated by the US Naval Observatory (USNO) (Guo et al. 2013), and published in the IERS Bulletin A files, for a year into the future in the daily interval, or the EOP service of the Institute of Applied Astronomy of Russian Academy of Sciences (IAA RAS) (Suvorkin et al. 2015). The products provided by these agencies comprise estimates for PM, UT1UTC, LOD, and other parameters, usually for a year into the future at daily sampling.
The polar motion includes a regular deterministic and an irregular stochastic component. The deterministic part consists of the longterm trend, Chandler wobbles (CW) (Chandler 1981; Zharkov and Molodensky 1996), Annual wobbles (AW), and SemiAnnual wobbles (SAW) (Wang et al. 2016; Gross 2000). Chandler Wobble is a resonant rotational mode of the Earth that decays freely due to the viscoelastic nature of the Earth. Studies have shown that CW will freely decay within 68 years to the minimum rotational energy state without excitation. It is generally believed that the oscillation period and amplitude of CW vary over time, with the period fluctuating between 1.13 and 1.20 years (Schuh et al. 2001). The annual oscillation of the pole curve includes both prograde and retrograde components. The intensity of the prograde part is 10 times that of the retrograde part, and there is a significant change in the period of the forward annual oscillation of the PM, which oscillates between 356 and 376 days (Joachim 2004). Considering the characteristics of secular drift, CW, and AW, scholars have conducted extensive studies and proposed various methods for predicting the ERPs. In General, these methods fall into linear and nonlinear models. Kalman Filtering (Babcock and Wilkins 1989), Least Squares (LS) extrapolation, fuzzy interface systems (Akyilmaz and Kutterer 2004), autoregression models (AR) (Sun and Xu 2012), autocovariance models (Kosek 2002), and different combinations of these methods (Kosek and Popiński 2005; Kosek et al. 2004; Kosek et al. 2008) are all linear models. Methods such as threshold autoregression models, artificial neural networks (Liao et al. 2012; Egger 1992), and fuzzy reasoning are nonlinear models.
More hybrid and machine learning methods have been introduced in recent years for predicting ERP variations. The rapid expansion in computing power and data volume has made applying deep learning in geodesy increasingly promising. In particular, the long shortterm memory (LSTM) network (Hochreiter and Schmidhuber 1997), one of the most popular forms of recurrent neural networks (RNNs), is advantageous for geodetic time series prediction. The LSTM network can capture the nonlinear structure between different time epochs in the time series due to the unique structure of its cells (Gers et al. 2000; Graves and Schmidhuber 2005). Some researchers have used the LSTM model in predicting the LOD (Gou et al. 2021), which might also be suitable for PM prediction problems. This study investigates the potential of utilizing LSTM combined with traditional methods for predicting PM. The method proposed is novel in that the nonlinear part of PM is not predicted by the linear method AR model but through the deep learning method LSTM model.
This paper is structured as follows: In the second section, we describe the LSTM and LS + AR algorithms. Section three introduces the dataset and processing strategy, including the data used in each experiment, the amplitude variation and characteristics of AW, SAW, and CW in PM through the Fast Fourier Transform (FFT) spectrum analysis, and the detailed PM prediction process by LSTM, LS + AR, and LS + LSTM model. Next, we present different models to estimate PM variability, including LS + AR, LSTM, and LS + LSTM, all of which draw on IERS EOP 14C04 data from 2011 to the end of 2020. At the same time, Bulletin A from the IERS RS/PC is used to compare the prediction accuracy with the results derived in this paper. A summary of the findings is given in the last section.
Materials and methods
LSTM prediction model
Introduction of the general concept of LSTM
LSTM is now widely used and has proven to perform well on various problems such as handwriting recognition, speech recognition, and time series prediction (Schmidhuber 2015; Alex et al. 2018). However, a neural architecture would not be widely utilized in practice without a solid theoretical foundation. Greff et al. (2017) recently reviewed several LSTM variants and their performances relative to the socalled vanilla model (Greff et al. 2017). The variant LSTM is an improved model based on the original LSTM (Hochreiter and Schmidhuber 1997; Gers and Schmidhuber 2000). The main change of the variant LSTM model compared to the original LSTM model is the addition of cell state information to the inputs of the three control gates. Unlike feedforward neural networks, the RNNs have a cycle function, which can take the activation in the previous steps as the network’s input and play a decisive role in the current input. However, training recurrent or very deep neural networks is challenging because they frequently suffer from exploding and vanishing gradient problems (Hochreiter 1991; Hochreiter et al. 2001). To solve the problems mentioned above, the LSTM architecture was developed to address this deficiency and the learning longterm dependencies. Figure 1 depicts the LSTM network structure, which is detailed in Appendix A.
LSTM training results analysis
In the LSTM network training, the hidden layers are set to 2 and the number of LSTM cells per hidden layer is 50. Time steps are set to 365 and training iterations are set to 1000. The learning efficient dropout is 0.1. The Savitzky–Golay (SG) smoothing filter is used in the experiments of this paper. The initial learning rate is set to 0.1 and the learn rate drop factor is 0.2 (Greff et al. 2017; Ren et al. 2020). The gradient threshold is set to 1 (Din et al. 2019). Other parameter settings are listed in Appendix A, Table 4. Figure 2 shows the LSTM network training based on the PM time series. Figure 2a and b indicates that the correlation between the original and output sequences of PMX and PMY is 0.99982 and 0.99987, respectively. Figure 2c and d shows that the Root Mean Square Error (RMSE) of PMX and PMY is 1.7916 mas and 1.6128 mas, respectively. Figure 2e and f shows that the Mean and the Standard Deviation (STD) of PMX and PMY are − 0.1682 mas and 0.3365 mas, and 1.7840 mas and 1.5776 mas, respectively.
LS + AR prediction model
LS model
We use the following model to fit the trend and periodic terms of EOP, whose parameters can be estimated using the least squares method. The residuals are then analyzed by the AR and other models. The least squares model can be described as
where A is the constant, B is the trend term parameter in the model, C_{1} and C_{2} are the SAW parameters, D_{1} and D_{2} are the AW parameters, and E_{1} and E_{2} are the CW parameters. The fitting model calculates P_{SA}, P_{A}, and P_{C} in years, representing the SAW, AW, and CW, respectively.
We additionally conducted an FFT analysis of the EOP 14C04 series. From Fig. 3a and b, it can be seen that CW and AW dominate the PM spectrum, manifested by cusps of power between 413 to 439 days (CW) and 356 to 376 days (AW). These values are relatively consistent with estimates given elsewhere (Mccarthy and Luzum 1991; Schuh et al. 2001; Joachim 2004). In our model (Eq. 1), the AW period is 365.25 days, CW is 434 days and SAW is 182.62 days. \(\upomega\) is the random error, \(t\) is the UTC of the series, and the unit is converted into years when LS fitting. Similarly, the meaning of each corresponding parameter table in the PMX series is identical to that of the PMY series.
AR model
AR(p) model is the description of the relationship between a random series \({z}_{t}\) (t = 1, 2, …, N) before time t and the current time. Its expression can be written as follows:
where \({\o }_{1, } {\o }_{2, }{\dots , \o }_{p}\) represent the autoregressive coefficients obtained by solving the Yule–Walker equations using the Levinson–Durbin recursion (Brockwell and Davis 1997), \({\omega }_{t}\) is the white noise with zero means, and \(p\) stands for the model order. The above equation denoted by \(\mathrm{AR}\left(p\right)\) is the AR model of the order \(p\), and how to determine the order \(p\) is crucial. Usually, there are three methods for the determination of \(p\), Akaike's final prediction error (FPE) criterion, the information criterion, and the delivery function criterion. In this paper, the FPE criterion is adopted to determine the order \(p\) and corresponds to the smallest FPE (Akaike 1971):
The mean absolute error (MAE) is utilized to evaluate the prediction accuracy. It can be expressed as follows:
where \({P}_{i }\) represents the predicted value of \(i\)th prediction, \({X}_{i}\) stands for the corresponding observation value, \(n\) is the total prediction number, and \({MAE}_{j}\) is the MAE at span \(j\).
Data description and processing strategy
Data description
In particular, we use the PM time series from IERS EOP 14C04 with daily sampling interval is available at https://hpiers.obspm.fr/eoppc/eop/eopc04/. In this study, we use the PM series from January 8, 2011 to December 31, 2021. The results will be compared to Bulletin A (558 files) available at https://www.iers.org/IERS/EN/DataProducts/EarthOrientationData/eop.html, for the same period of time as IERS EOP 14C04. The LSTM network training is based on the PM time series from January 1, 2011 to December 31, 2020.
PM prediction processing strategy
Figure 4 depicts a schematic representation of the methodology adopted for predicting PM with various models. The observed PM can be divided into deterministic and stochastic components. The known component is referred to as a priori model, consisting of the longterm trend, CW, AW, and SAW. In this study, the LS + AR model (first method) is applied to forecast PMX and PMY and compared to results based on the LS + LSTM (second method) and LSTM models (third method). Figure 4 describes the respective processing schemes. For the first two methods, training patterns are derived from the residuals after subtracting the a priori model. These patterns are used for training the LSTM. The predicted residuals are then added to the a priori model to obtain the final predicted values of the PMX and PMY. The third method for predicting PM uses the LSTM model directly, relying on the IERS EOP14C04 time series.
Results and discussion
PM prediction using the LS + AR model
We initially preprocess the PM series using the LS model and deduce residuals (that is, the stochastic components) by subtracting the LS analysis results from the original pole coordinates. Figure 5a shows the PM residual results (purple line) derived from the IERS EOP 14C04 from 2011 to 2020. The residuals of the PMX and PMY are within \(\pm 0.08\) arcsecond (as). Due to the nature of the LS fitting model, the fluctuations at the start and end of the time series are somewhat larger compared to the middle part. Figure 5b depicts the firstorder difference of the residual sequence (brown line) for PMX and PMY. Most of the residual values and the first difference values are within \(\pm 0.1\) arcseconds (as) and \(\pm 2.0\) milliarcseconds (mas), respectively.
The evaluation of the Autoregressive Integrated Moving Average (ARIMA) model type is essential. When \(p=0\), the \(\mathrm{ARIMA}(p,q)\) (autoregressive integrated moving average) model can be expressed as \(\mathrm{MA}(q)\), i.e., q order moving average model. When \(q=0\), the model can be described as \(\mathrm{AR}(p)\), i.e., a porder autoregressive model (Box et al. 1976). For the PM time series, Table 1 illustrates the criteria for which model can be evaluated according to the autocorrelation and partial correlation functions of the time series. Figure 6 depicts the subsequent calculation of the autocorrelation function (ACF) and partial autocorrelation function (PACF) for the firstorder differential of residuals time series with a delay of 1 to 40 days. The results indicate that the autocorrelation function of the firstorder difference sequence of the residual error is tailing, and the partial correlation function is truncated, allowing the \(\mathrm{AR}(p)\) model to be used for prediction, i.e., \(q=0\) (Schaffer et al. 2021). In this research, the FPE is used to determine order p, and this method is described in Eqs. (3) and (4). The optimal order p for the AR model, as determined by the final prediction error criterion, is set to 50.
In this experiment, we extrapolate the deterministic part of 2021 (365 days) using the LS model, based on the least squares fitting series of the IERS EOP14C04 from January 1, 2011 to December 31, 2020 (10 years). Figure 7 shows the LS model time series (red line) and LS extrapolation time series (blue line) of PM. The experimental purpose is to use the LS + AR model for PM prediction. The final prediction results of the LS + AR model are the sum of the LS extrapolation using the determined part and the prediction results of the AR model based on the residual sequence.
PM prediction using the LSTM model
Figure 8 displays the PM prediction based on the IERS EOP14C04 time series (blue line) utilizing the LSTM model. The green line represents the prediction outcomes for 2021 (365 days). It can be seen that the prediction results of PMX and PMY based on the LSTM model for 365 days in 2021 are consistent with the overall trend of PMX and PMY time series from the IERS EOP 14C04. Most PMX prediction values fall between 0.15 as and 0.21 as and the PMY prediction values are between 0.28 as and 0.42 as.
PM prediction using the LS + LSTM model
To investigate the contribution of the LS + LSTM model in PM prediction, LSTM is applied to forecast the residual part in 2021, using the residuals' basic time series. Figure 9 presents the 2021 residual values (blue line) prediction from the LSTM model. The final PM prediction using the LS + LSTM model is the total results of the LS extrapolation from the LS model and the residuals determined by the LSTM model.
Figure 10a and b depicts the final prediction of the PM with different methods, including the LS + AR, LSTM, and LS + LSTM models. In addition, the forecast results of Bulletin A in 2021 are included (purple line). The IERS EOP14C04 time series is considered a benchmark for comparing the estimated outcomes of various techniques. In terms of PM prediction, the results predicted by the LS + LSTM model (green line) in Fig. 10c and d are the closest to the IERS EOP14C04 time series (red) over the mid and longterm prediction. Although the improvement is marginal, the findings predicted by the LS + LSTM model are very close to or better than Bulletin A in the mid and longterm prediction of PMX, and it also can be seen that the PMX prediction accuracy from the LSTM model is higher than that from the LS + AR model in the mid and long term. For PMX, the RMSE of the results is 0.035 as, 0.031 as, 0.018 as, and 0.030 as for LS + AR, LSTM, LS + LSTM, and Bulletin A, respectively. For PMY, the RMS of the results is 0.038 as, 0.035 as, 0.015 as, and 0.035 as for LS + AR, LSTM, LS + LSTM, and Bulletin A, respectively.
Evaluating the PM prediction results
Based on the previous analysis (Fig. 10), the LS + LSTM model prediction accuracy is higher than other models. To assess the accuracy of this method in a more extended way, we compare the PM prediction in different periods using the LS + LSTM model, LS + AR model, and LSTM model as shown in Fig. 11. The prediction span is 365 days with a tenyear basic sequence, and the statistical period is from 2011 to 2020. The model proposed in this experiment is based on the IERS EOP 14C04 for PM prediction. In Fig. 11 the orange, brown, green, and purple lines represent the LS + AR, LSTM, LS + LSTM, and Bulletin A prediction of PM, respectively. We also compare the PM prediction results from the LS + LSTM model to the IERS EOP 14C04. In the mid and longterm prediction of the PM, the prediction results based on the LS + LSTM model are closer to the observed IERS EOP 14C04 time series than those based on Bulletin A.
AE of PM prediction with different models
Experiments have demonstrated that the LS + LSTM model is superior for predicting PM, especially in the mid and long term. To further explore the advantages of the LS + LSTM model in the accuracy of PM prediction, four different cases were designed to predict PM for 11 years (from 2011 to 2021). In this experiment, the prediction span was 365 days with a weekly sliding window. The experiment is divided into four parts, considering the following methods:
Case 1: PMX and PMY prediction based on the LS + AR model;
Case 2: PMX and PMY prediction based on the LSTM model;
Case 3: PMX and PMY prediction based on the LS + LSTM model;
Case 4: PMX and PMY prediction from Bulletin A achieved from the IERS RS/PC.
The four cases listed above correspond to the LS + AR, the LSTM, the LS + LSTM, and the Bulletin A provided by the IERS, respectively. Authors generally rely on the 10year IERS EOP14C04 time series in the PM prediction as the basic series (Xu et al. 2012; Xu and Zhou 2015; Kenyon et al. 2012). In the following experiments, using various methods, we also choose a tenyear base sequence to predict the PM for the next 365 days.
Figure 12 shows the PM prediction’s absolute errors (AE) using four cases. All experimental results take the IERS EOP14C04 time series as a reference. It can be seen that the accuracy of the four cases (LS + LSTM, LSTM, LS + LSTM, and Bulletin A) in 2011–2015 is inferior to that in 2016–2021. One potential reason could be the 2011 earthquake on the Pacific coast of Tōhoku (the 3.11 Japan earthquake). Based on the data from the Jet Propulsion Laboratory (JPL) of the National Aeronautics and Space Administration (NASA), the 3.11 Japan earthquake shifted the Earth's rotation axis by 25 cm and accelerated the Earth's rotation rate by 1.8 microseconds (Gross 2007). Earthquakes not only cause significant changes in the Earth's rotation on the day they occur, but they also impact the location changes of surface stations over the next 3–5 years, thus affecting ERP monitoring (Souriau 1986; Bizouard, 2005; Bogusz et al. 2015). IERS introduced postseismic deformation (PSD) in 2017 when establishing the most recent international terrestrial coordinate framework (ITRF2014) to reduce the influence of earthquakes on ground stations and obtain more accurate ERPs data. It is worth noting that ITRF2014 is the most recent ITRF solution at the time of the study. The IERS EOP14C04 also corrected the PSD model of the large earthquake in Japan in March 2011 to precisely solve the PM change phenomenon at this stage; hence, the PM results at this phase deviated from the previous comprehensive trend. However, this deviation was not considered when the models described in this study were used to predict PM, likely resulting in prediction errors. Thus, our preliminary conclusion is that the larger deviations of the prediction results from the observed values (EOP 14C04) between 2011 and 2015 are attributable to the effects of large earthquakes. To improve the accuracy of PM prediction following a major earthquake, further PSD model processing of the prediction algorithm is required. The results predicted by the LS + LSTM are closer to the IERS EOP14C04 series than those by the LS + AR, LSTM, and Bulletin A in mid and longterm prediction.
MAE of PM prediction with different models
Figure 13 shows the Mean absolute errors (MAE) of PMX and PMY prediction in four cases. Compared to the other models, the MAE of the proposed LS + LSTM model yields smaller errors in mid and longterm prediction. Since the LS + LSTM model better considers the overall characteristics of the base series, it obtains a more accurate longterm trend and long period term than the LS model during extrapolation, thus improving the mid and longterm PM prediction accuracy.
As Table 2 reveals, the estimation accuracy of the PM is determined using different models, i.e., the LS + AR, LSTM, and LS + LSTM models. The MAE of predicted PM at various periods (1, 5, 10, 15, 20, 30, 45, 90, 120, 180, 270, 320, 365 days) is listed in Table 2. Combined with the PM prediction accuracy statistics, the improvement of LS + LSTM over Bulletin A is clear after 120 days. The improvement gradually increases with the lengthening of the prediction span, reaching a maximum of 33.7% and 31.9% in PMX and PMY, respectively. Generally, the LS + LSTM model has more advantages than the LSTM model, the traditional linear prediction model (LS + AR model), and Bulletin A in mid and longterm PM prediction.
However, Bulletin A exhibits a smaller MAE in the four cases for shortterm prediction. Table 2 demonstrates that Bulletin A outperformed the other models regarding shortterm prediction, especially the ultrashort term (the first ten days in the future). This advantage is primarily due to Bulletin A considering the effects of atmospheric angular momentum (AAM) and oceanic angular momentum (OAM). In addition, the statistical results demonstrate that the LSTM model is superior to the traditional model (LS + AR) in the longterm (270–365 days) prediction of PM.
Conclusions
Polar motion is a crucial parameter describing the instantaneous movement of the Earth’s rotation axis relative to the bodyfixed reference frame. Among existing prediction models, linear models are often used to predict PM, such as the LS + AR model. Here, we have analyzed the PM series data from January 8, 2011 to September 11, 2021 by different models, including LS + AR, LSTM, LS + LSTM, and Bulletin A. The residual series used in this research is obtained by removing the longterm trend term and the calculated AW, SAW, and CW values. In this paper, based on the characteristics of PM and its inherent periodic and trend terms, the LSTM prediction model is proposed. To verify the advantages of LSTM and its combination with LS in PM prediction, the basic sequence length of 10 years is selected, which is optimal for the LS + AR model to predict PM. The experimental findings demonstrated that the LS + LSTM model is superior for mid and longterm forecasting of PM. Compared to Bulletin A, published by IERS, the LS + LSTM model demonstrates improved PMX and PMY prediction accuracy by up to 33.7% and 31.9%, respectively, and the LSTM model outperforms the LS + AR model in the mid and long term.
The study's findings rely heavily on the 10year snippet of PM time series between 2011 and 2021. Future research will investigate the relationship between the length of the basic time series, the seismic factors, and the accuracy of LSTM and LS + LSTM models in predicting PM. The prediction model such as LS + LSTM, based on proper base sequence length and seismic factor correction, will be established to improve the shortterm PM prediction. In addition, the benefits of combining the LSTM with LS and other traditional methods for PM shortterm prediction need to be further explored.
Availability of data and materials
The datasets of IERS EOP 14C04 with a daily interval are available at https://hpiers.obspm.fr/eoppc/eop/eopc04/; the Bulletin A can find at https://www.iers.org/IERS/EN/DataProducts/EarthOrientationData/eop.html.
Abbreviations
 AW:

Annual wobbles
 AR:

Autoregressive models
 ARIMA:

Autoregressive integrated moving average
 ACF:

Autocorrelation function
 AE:

Absolute errors
 AAM:

Atmospheric angular momentum
 CW:

Chandler wobbles
 EOPs:

Earth orientation parameters
 ERPs:

Earth rotation parameters
 FFT:

Fast Fourier Transform
 FPE:

Final prediction error
 GPS:

Global Position System
 IERS:

International Earth Rotation and Reference Systems Service
 IAA:

Institute of Applied Astronomy
 JPL:

Jet Propulsion Laboratory
 LOD:

Length of Day
 LS:

Least squares
 LSTM:

Long shortterm memory
 MAE:

Mean absolute errors
 NASA:

National Aeronautics and Space Administration
 OAM:

Oceanic Angular Momentum
 RNNs:

Recurrent neural networks
 PM:

Polar motion
 PMX:

X coordination of PM
 PMY:

Y coordination of PM
 PACF:

Autocorrelation function
 PSD:

Postseismic deformation
 RS/PC:

Rapid Service/Prediction Center
 RAS:

Russian Academy of Sciences
 SG:

Savitzky–Golay
 SLR:

Satellite Laser Ranging
 SAW:

SemiAnnual wobbles
 UT1:

Universal Time
 UTC:

Coordinated Universal Time
 USNO:

The US Navy Observatory
 VLBI:

Very Long Baseline Interferometry
References
Akaike H (1971) Autoregressive model fitting for control. Annals Inst Stat Math 23:163–180. https://doi.org/10.1007/BF02479221
Akyilmaz O, Kutterer H (2004) Prediction of Earth rotation parameters by fuzzy inference systems. J Geod. https://doi.org/10.1007/s0019000403745
Alex G, Marcus L, Santiago F, Roman B, Horst B, Jürgen S (2018) A novel connectionist system for unconstrained handwriting recognition. IEEE Trans Pattern Anal Mach Intell 31:855–868. https://doi.org/10.1109/TPAMI.2008.137
Babcock AK, Wilkins GA (1989) The Earth rotation and reference frames for Geodesy and Geodynamics. Geophys J Int. https://doi.org/10.1111/j.1365246X.1989.tb04457.x
Bizouard C (2005) Influence of the earthquakes on the polar motion with emphasis on the Sumatra event. In: J Journées Systèmes de Référence SpatioTemporels, Proceedings. pp 229–232
Bogusz J, Brzezinski A, Kosek W, Nastula J (2015) Earth rotation and geodynamics. Geodesy Cartogr 64(2):201–242. https://doi.org/10.1515/geocart20150013
Börger L, Schindelegger M, Dobslaw H, Salstein D (2023) Are ocean reanalyses useful for earth rotation research? Earth Space Sci. https://doi.org/10.1029/2022ea002700
Box GEP, Jenkins GM, Reinsel GC, Ljung GM (1976) Time series analysis: forecasting and control. Holden Day, San Francisco, pp 88–125
Brockwell PJ, Davis RA (1997) Introduction to time series and forecasting, 2nd edn. Springer, New York, pp 81–106
Chandler S (1981) On the variation of latitude. Astron J I:56–61. https://doi.org/10.1038/056040a0
Din AZU, Ayaz Y, Hasan M, Khan J, Salman M (2019) Bivariate shortterm electric power forecasting using LSTM network. In: 2019 International Conference on Robotics and Automation in Industry (ICRAI). IEEE, pp 1–8
Dobslaw H, Dill R, Grötzsch A, Brzeziński A, Thomas M (2010) Seasonal polar motion excitation from numerical models of atmosphere, ocean, and continental hydrosphere. J Geophys Res. https://doi.org/10.1029/2009jb007127
Egger D (1992) Neuronales Netz Prädiziert Erdrotation. AVNAllgemeine VermessungsNachrichten 99:517–524
Gers FA, Schmidhuber J, Cummins F (2000) Learning to forget: continual prediction with LSTM. Neural Comput 12:2451–2471. https://doi.org/10.1049/cp:19991218
Gers FA, Schmidhuber J (2000) Recurrent nets that time and count. In: IJCNN 2000, Neural Networks. pp 189–194
Gou J, Kiani Shahvandi M, Hohensinn R, Soja B (2021) Ultrashortterm prediction of LOD using LSTM neural networks. In: EGU General Assembly Conference, Vienna, Austria. https://doi.org/10.5194/egusphereegu212308
Graves A, Schmidhuber J (2005) Framewise phoneme classification with bidirectional LSTM and Other Neural Network architectures. Neural Netw 18:602–610. https://doi.org/10.1016/j.neunet.2005.06.042
Greff K, Srivastava RK, Koutnik J, Steunebrink BR, Schmidhuber J (2017) LSTM: a search space Odyssey. IEEE Trans Neural Netw Learn Syst 28(10):2222–2232. https://doi.org/10.1109/TNNLS.2016.2582924
Gross RS (2000) the excitation of the Chandler wobble. Geophys Res Lett 27(15):2329–2332. https://doi.org/10.1029/2000gl011450
Gross RS (2007) Earth roation variationslong period. In: Herring T (ed) Treatise on geophysics, vol 3. Elservier, Amsterdam, pp 239–294. https://doi.org/10.1016/B9780444527486/000572
Guo JY, Li YB, Dai CL, Shum CK (2013) A technique to improve the accuracy of Earth orientation prediction algorithms based on least squares extrapolation. J Geodyn 70:36–48. https://doi.org/10.1016/j.jog.2013.06.002
Hochreiter S (1991) Untersuchungen zu dynamischen neuronalen Netzen. Diploma thesis, Germany, Technische Universität München
Hochreiter S, Schmidhuber J (1997) Long shortterm memory. Neural Comput 9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Hochreiter S, Bengio Y, Frasconi P, Schmidhuber J (2001) Gradient flow in recurrent nets the difficulty of learning longterm dependencies. In: JF Kolen (ed) IEEE Press, Los Alamitos. https://doi.org/10.1109/9780470544037.ch14
Joachim H (2004) Lowfrequency variations chandler and annual wobbles of polar motion as observed over one century. Surv Geophys 25:1–54. https://doi.org/10.1023/B:GEOP.0000015345.88410.36
Karevan Z, Suykens JAK (2020) Transductive LSTM for timeseries prediction: an application to weather forecasting. Neural Netw 125:1–9. https://doi.org/10.1016/j.neunet.2019.12.030
Kenyon SC, Pacino MC, Mart U (2012) Geodesy for planet earth. In: Kenyon SC, Pacino MC, Mart U (eds) 2009 IAG Symposium, Buenos Aires, Argentina, August 31–September 4. Springer, pp 513–520. www.iag2009.com.ar
Kosek W (2002) Autocovariance prediction of complexvalued polar motion time series. Adv Space Res 30:375–380. https://doi.org/10.1016/S02731177(02)003101
Kosek W, Popiński W (2005) Forecasting of pole coordinates data by combination of the wavelet decomposition and autocovariance prediction. In: Journees 2005 Systemes de Reference SpatioTemporels. pp 139–140
Kosek W, McCarthy DD, Johnson TJ, Kalarus M (2004) Comparison of polar motion prediction results supplied by the IERS Subbureau for Rapid Service and predictions and results of other prediction methods. In: Finkelstein A CN (ed) the Journées 2003 "Systèmes deréférence spatiotemporels, Petersburg. pp 164–169
Kosek W, Kalarus M, Niedzielski T (2008) Forecasting of the earth orientation parameters:comparison of different algorithms. In: Capitaine N (ed) Proceedings of the journèes 2007, Paris; pp 155–158
Liao DC, Wang QJ, Zhou YH, Liao XH, Huang CL (2012) Longterm prediction of the Earth Orientation Parameters by the artificial neural network technique. J Geodyn 62:87–92. https://doi.org/10.1016/j.jog.2011.12.004
Liu Y, Guan L, Hou C, Han H, Liu Z, Sun Y, Zheng M (2019) Wind power shortterm prediction based on LSTM and discrete wavelet transform. Appl Sci. https://doi.org/10.3390/app9061108
Mccarthy DD, Luzum BJ (1991) Prediction of earth orientation. Bulletin Géodésique 65(1):18–21
Petit G, Luzum B (2010) IERS Conventions 2010. IERS Technical Note 36. Verlag des Bundesamts für Kartographie und Geodäsie, Frankfurt am Main
Ren Z, Huangfu Y, Xie R, Ma R (2020) Modeling of Proton Exchange Membrane Fuel Cell Based on LSTM Neural Network. Paper presented at the 2020 Chinese Automation Congress (CAC)
Schaffer AL, Dobbins TA, Pearson SA (2021) Interrupted time series analysis using autoregressive integrated moving average (ARIMA) models: a guide for evaluating largescale health interventions. BMC Med Res Methodol 21(1):58. https://doi.org/10.1186/s12874021012358
Schmidhuber J (2015) Deep Learning in neural networks an overview. Neural Netw 61:85–117. https://doi.org/10.1016/J.NEUNET.2014.09.003
Schuh H, Nagel S, Seitz T (2001) Linear drift and periodic variations observed in long time series of polar motion. J Geod 74:701–710. https://doi.org/10.1007/s001900000133
Schuh H, Ulrich M, Egger D, Müller J, Schwegmann W (2002) Prediction of Earth orientation parameters by artificial neural networks. J Geod 76(5):247–258. https://doi.org/10.1007/s0019000102425
Souriau A (1986) The Influence of Earthquakes on the Polar Motion. In: Cazenave A (ed) Earth Rotation: Solved and Unsolved Problems. Springer Netherlands, Dordrecht %@ 9789400947504, pp 229–240. https://doi.org/10.1007/9789400947504_16
Sun Z, Xu T (2012) Prediction of earth rotation parameters based on improved weighted least squares and autoregressive model. Geodesy Geodyn 3(3):57–64. https://doi.org/10.3724/sp.J.1246.2012.00057.1
Sun Z, Xu T, Jiang C, Yang Y, Jiang N (2019) An improved prediction algorithm for Earth’s polar motion with considering the retrograde annual and semiannual wobbles based on least squares and autoregressive model. Acta Geod Geophys 54(4):499–511. https://doi.org/10.1007/s40328019002744
Suvorkin VV, Kurdubov SL (2015) I.S. G GNSS Processing in Institute of Applied Astronomy RAS. In: Malkin Z., Capitaine N (eds) In: Proceeding of the Journées 2014 "Systèmes de référence spatiotemporels": Recent developments and prospects in groundbased and space astrometry, Petersburg, Russia. pp 261–262
Wang G, Liu L, Su X, Liang X, Yan H, Tu Y, Li Z, Li W (2016) Variable Chandler and annual wobbles in Earth’s Polar motion during 1900–2015. Surv Geophys 37(6):1075–1093. https://doi.org/10.1007/s1071201693840
Wang J, Jiang W, Li Z, Lu Y (2021) A new multiscale sliding window LSTM framework (MSSWLSTM): a case study for GNSS timeseries prediction. Remote Sens. https://doi.org/10.3390/rs13163328
Xu X, Zhou Y (2015) EOP prediction using least square fitting and autoregressive filter over optimized data intervals. Adv Space Res 56(10):2248–2253. https://doi.org/10.1016/j.asr.2015.08.007
Xu XQ, Zhou YH, Liao XH (2012) Shortterm earth orientation parameters predictions by combination of the leastsquares, AR model and Kalman filter. J Geodyn 62:83–86. https://doi.org/10.1016/j.jog.2011.12.001
Yu Y, Si X, Hu C, Zhang J (2019) A review of recurrent neural networks: LSTM cells and network architectures. Neural Comput 31(7):1235–1270. https://doi.org/10.1162/neco_a_01199
Zhang XH, Wang QJ, Zhu JJ, Zhang H (2012) Application of general regression neural network to the prediction of LOD change. Chinese J Astron Ast 36(1):86–96. https://doi.org/10.1016/j.chinastron.2011.12.010
Zhang X, Liang X, Zhiyuli A, Zhang S, Xu R, Wu B (2019) ATLSTM: an attentionbased LSTM model for financial time series prediction. IOP Conf Ser Mater Sci Eng. https://doi.org/10.1088/1757899x/569/5/052037
Zharkov VN, Molodensky SM (1996) On the Chandler wobble of Mars. Planet Space Sci 44(11):1457–1462. https://doi.org/10.1016/S00320633(96)000529
Acknowledgements
This study is under the support of the National Natural Science Foundation of China (NSFC) (Grant Nos. 41874035, 12103035, 42030105). The authors are grateful to IERS for the EOP 14C04 solution. It is worth stating that all the prediction models in this experiment were implemented based on our selfdeveloped software compiled in the MATLAB platform. Interested readers may contact the authors by email.
Funding
Funding was provided by the National Natural Science Foundation of China (NSFC) (Grant Nos. 41874035, 12103035, 42030105).
Author information
Authors and Affiliations
Contributions
CW designed the framework of this research in this paper. CW and PZ designed and derived the theoretical formulations, carried out the observations of the microtremors, and wrote the article. CW performed the numerical calculations. All the authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Description of the LSTM training process
LSTM adopts three unique gate designs to avoid gradient explosion and longterm dependence (Yu et al. 2019). Since each cycle uses information from the previous cycle and each output state is affected by the previous state, the LSTM network can better remember longterm laws more effectively and is widely used in time series prediction, such as financial time series prediction (Zhang et al. 2019), GNSS time series prediction (Wang et al. 2021), and weather forecasting (Karevan and Suykens 2020). A vanilla LSTM unit contains a cell, an input gate, an output gate, and a forget gate. This forget gate was not initially a part of the LSTM network but was proposed by Gers et al. (2000). Figure 1 depicts the LSTM network structure adopted in this paper. At time t, the first layer network comprises two information flows. The information flow from \({C}_{t1}\) to \({C}_{t}\) represents the transmission of cell state. The entire line linearly interacts with the following information flow through three gate control structures.
The gate structure allows information to selectively pass through, i.e., determining whether the information is removed or added to the cell state from \({C}_{t1}\) to \({C}_{t}\). That is to say, this part is the screening of information input in the gating structure. The \(\upsigma\) activation function layer and the tanh activation function layer can convert the input between (0, 1) and (− 1, 1), generate the weight of the input data, and filter the input data. There are three gate structures in each layer of the LSTM network to control the cell state:
(1) Forget gate
The first step in the LSTM is to decide what information will be removed from the cell state. This decision is made by the \(\upsigma\)layer called the "forget gate layer." It looks at \({h}_{t1}\) and \({x}_{t}\) and outputs a number between 0 and 1 for each number in the cell state \({C}_{t1}\). 1 is "completely keep \({h}_{t1}\) and \({x}_{t}\) information," while the 0 represents "completely get rid of \({h}_{t1}\) and \({x}_{t}\) information." The formula of the "forget gate"\({f}_{t}\) is as follows:
where \(\upsigma\) is the activation function, \({W}_{f}\) is the weight, \({h}_{t1}\) is the recurrent information at time t1, \({x}_{t}\) denote the input information, and \({b}_{f}\) is the bias of the forget gate.
(2) Input gate
One part of the "input gate" linearly combines \({x}_{t}\) with the hidden state \({h}_{t1}\) at the previous time to obtain \({i}_{t}\) through \(\upsigma\)layer activation from Eq. (7) (Wang et al. 2021). This part determines which information needs to be updated; this is part of the forgetting gate selected to be forgotten. In the other part, \({h}_{t1}\) and \({x}_{t}\) are passed through a tanh layer to generate a vector \(\widetilde{{C}_{t}}\), which is alternatively employed to update the new content. Then the two parts are combined to update the state \({C}_{t1}\) to \({C}_{t}\). The expression of the "input gate" is as follows (Hochreiter and Schmidhuber 1997; Wang et al. 2021):
cell state equation at time t,
where \({C}_{t}\) denotes the cell state of LSTM, tanh is the activation function, \({W}_{i }, {W}_{C}\) are the weights, * represents convolution, \({b}_{i }, {b}_{c}\) are the bias of the input gate,\({\widetilde{C}}_{t}\) is the cell status update value, and other parameters are consistent with those mentioned above.
(3) Output gate
The "output gate" updates the value of the hidden layer output at the current time, i.e., \({h}_{t}\), through Eq. (11), is the hidden state at time t. LSTM can remember longterm historical information because every cycle uses the information \({C}_{t}\) and \({h}_{t1}\) of the previous cycle, and each output state is affected by the previous state. Especially, the forget gate can decide what information will be removed from the cell state. The formula of the "output gate" is as follows:
where \({o}_{t}\) is the output gate, \({W}_{o}\) is the weight associated with \({x}_{t}\), \({b}_{o}\) is for the bias weight vector, and \({h}_{t}\) is the hidden state. Equations (6)–(10), \({W}_{\zeta }\),\({b}_{\xi }\), \(\upzeta \in \left\{f,i,c,o\right\}\),\(\upxi \in \left\{f,i,c,o\right\}\), respectively, represent the output weight and offset matrix, which are also parameters to be learned in training.
In network training, one should pay attention to possible overfitting. It can minimize the loss function by constantly adjusting the parameters. For example, when the total number of samples is N, the output value \({Y}_{i}^{*}\) trained by the network and the expected output value \({Y}_{i}\) can be expressed by the mean squared error (MSE) loss function, also known as the L2 Loss function. Its basic form is as follows:
LSTM Model training is the process of determining the parameters in the model structure. First, the original PM time series defines as \({F}_{o}= \left\{{f}_{1}, {f}_{2},\dots ,{f}_{n}\right\}\) in the input layer. The training set and test set can be divided into \({F}_{tr}= \left\{{f}_{1}, {f}_{2},\dots ,{f}_{m}\right\}\) and \({F}_{te}= \left\{{f}_{m+1}, {f}_{m+2},\dots ,{f}_{m}\right\}\), satisfying the constraint conditions \(m<\) n, and \(m, n \in \mathrm{N}\). Then standardize the element \({f}_{t}\) in the training set using the classic zscore standardization formula as Eq. (13) (Liu et al. 2019). The \({x}{\prime}(t)\) represents the PM values at t, \({x}_{\mathrm{mean}}{\prime}\) and \({x}_{\mathrm{std}}{\prime}\) are the mean and standard deviation of \({x}{\prime}(t)\), respectively. The standardized training set can be expressed as
In order to adapt to the characteristics of hidden layer input, a data segmentation method is applied to \({F}_{tr}{\prime}\). Set the split window length value to \(L\), and the model input after the split is
The corresponding theoretical output is
Next, input the \(X\) to the hidden layers, which contain \(L\) isomorphic LSTM cells connected at the front and back times. The output of \(X\) after passing through the hidden layer is represented as
where \({C}_{p1}\) and \({H}_{p1}\) represent the state and output of the previous LSTM cell, respectively; \({LSTM}_{\mathrm{forward}}\) represents the forward calculation method of LSTM cells. If the cell state vector size is set to \({S}_{\mathrm{state}}\), the sizes of both \({C}_{p1}\) and \({H}_{p1}\) vectors are both \({S}_{\mathrm{state}}\). It can be seen that the hidden layer output \(P\), model input \(X\), and theoretical output \(Y\) are twodimensional arrays with dimensions of (mL, L). The mean square error is selected as the error calculation formula, and the loss function of the training process can be expressed as
Application of the trained LSTM model
Set the minimum loss function as the optimization goal, and given the random seed for network initialization, learning efficiency is η, and training steps, apply the Adam optimization algorithm to continuously update the network weight to obtain the final hidden side network.
This section applies the trained LSTM network (\({\mathrm{LSTM}}_{\mathrm{net}}^{*}\)) for prediction, and the prediction process adopts an iterative method. First, the last row of data in theoretical output \(Y\) is
Enter \({Y}_{f}\) into \({\mathrm{LSTM}}_{\mathrm{net}}^{*}\), the output result can be expressed as
The predicted value at time \(m+1\) is \({p}_{m+1}\). Then, combine the last \(L1\) data point of \({Y}_{f}\) and \({P}_{m+1}\) into a new row of data
Enter \({Y}_{f+1}\) into \({\mathrm{LSTM}}_{\mathrm{net}}^{*}\); then the predicted value at time \(m+2\) is \({\mathrm{P}}_{m+2}\), and so on. The resulting prediction order is
Next, by performing zscore denormalization on \({P}_{o}\)(represented as de_zscore), and the formula of \({P}_{te}\) is the denormalization as shown in Eq. (27) (Liu et al. 2019), the final prediction sequence corresponding to the test set \({F}_{te}\) is obtained as.
Similarly, using each row of \(X\) as model input can obtain a fitting sequence \({P}_{tr}\) corresponding to the training set \({F}_{\mathrm{tr}}\). The training of LSTMbased PM time series models, prediction algorithms, and parameter optimization of LSTM prediction models are shown in Table
3, and other parameter settings are shown in Table
4.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Wang, C., Zhang, P. Improving the accuracy of polar motion prediction using a hybrid least squares and long shortterm memory model. Earth Planets Space 75, 153 (2023). https://doi.org/10.1186/s40623023019108
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s40623023019108