Skip to main content

Improving the accuracy of polar motion prediction using a hybrid least squares and long short-term memory model


Earth rotation parameters (ERPs) are essential for transforming between the celestial and terrestrial reference frames, and for high-precision space navigation and positioning. Among the ERPs, polar motion (PM) is a critical parameter for analyzing and understanding the dynamic interaction between the solid Earth, atmosphere, ocean, and other geophysical fluids. Traditional methods for predicting the change in ERPs rely heavily on linear models, such as the least squares (LS) and the autoregressive (AR) model (LS + AR). However, variations in ERP partly reflect non-linear effects in the Earth system, such that the predictive accuracy of linear models is not always optimal. In this paper, long short-term memory (LSTM), a non-linear neural network, is employed to improve the prediction of ERPs. Polar motion prediction experiments in this study are conducted using the LSTM model and a hybrid method LS + LSTM model based on the IERS EOP14C04 time series. Compared with Bulletin A, the PMX and PMY prediction accuracy can reach a maximum of 33.7% and 31.9%, respectively, with the LS + LSTM model. The experimental results show that the proposed hybrid model displays a better performance in mid- and long-term (120–365 days) prediction of polar motion.

Graphical Abstract


Due to the rapid development of space technology, the accuracy of Earth Orientation Parameter (EOP) estimates dramatically improved in the 1990s (Schuh et al. 2002) and has remained at a high level ever since. The EOPs consist of precession–nutation, polar motion (PM), the difference between Universal Time (UT1) and Coordinated Universal Time (UTC), that is, UT1-UTC, and Length of Day (LOD) (Petit and Luzum 2010). Earth Rotation Parameters (ERPs) comprise the PM (including PMX and PMY), UT1-UTC, and LOD. The instantaneous movement of the Earth's rotation axis with respect to the terrestrial reference frame is described by the PM. As for PM, a major complication is that it is caused by partially unpredictable mass redistributions on the surface and in the interior of the Earth (Gross 2007; Dobslaw et al. 2010; Sun et al. 2019; Börger et al. 2023). Modern space navigation and deep space exploration are increasingly required for accurate real-time prediction of ERPs. Given the complicated data processing of modern geodesic techniques, such as Global Positioning System (GPS) technology, the acquisition of ERP results must be delayed by 15–20 h. Obtaining ERPs requires several days for the Very Long Baseline Interferometry (VLBI) and Satellite Laser Ranging (SLR) technologies. These factors make it challenging to acquire real-time ERPs, emphasizing the need for accurate predictions (Zhang et al. 2012). Several national and international services publish predicted values of EOPs, such as the International Earth Rotation and Reference Systems Service (IERS) Rapid Service/Prediction Center (RS/PC), operated by the US Naval Observatory (USNO) (Guo et al. 2013), and published in the IERS Bulletin A files, for a year into the future in the daily interval, or the EOP service of the Institute of Applied Astronomy of Russian Academy of Sciences (IAA RAS) (Suvorkin et al. 2015). The products provided by these agencies comprise estimates for PM, UT1-UTC, LOD, and other parameters, usually for a year into the future at daily sampling.

The polar motion includes a regular deterministic and an irregular stochastic component. The deterministic part consists of the long-term trend, Chandler wobbles (CW) (Chandler 1981; Zharkov and Molodensky 1996), Annual wobbles (AW), and Semi-Annual wobbles (SAW) (Wang et al. 2016; Gross 2000). Chandler Wobble is a resonant rotational mode of the Earth that decays freely due to the viscoelastic nature of the Earth. Studies have shown that CW will freely decay within 68 years to the minimum rotational energy state without excitation. It is generally believed that the oscillation period and amplitude of CW vary over time, with the period fluctuating between 1.13 and 1.20 years (Schuh et al. 2001). The annual oscillation of the pole curve includes both prograde and retrograde components. The intensity of the prograde part is 10 times that of the retrograde part, and there is a significant change in the period of the forward annual oscillation of the PM, which oscillates between 356 and 376 days (Joachim 2004). Considering the characteristics of secular drift, CW, and AW, scholars have conducted extensive studies and proposed various methods for predicting the ERPs. In General, these methods fall into linear and non-linear models. Kalman Filtering (Babcock and Wilkins 1989), Least Squares (LS) extrapolation, fuzzy interface systems (Akyilmaz and Kutterer 2004), autoregression models (AR) (Sun and Xu 2012), autocovariance models (Kosek 2002), and different combinations of these methods (Kosek and Popiński 2005; Kosek et al. 2004; Kosek et al. 2008) are all linear models. Methods such as threshold autoregression models, artificial neural networks (Liao et al. 2012; Egger 1992), and fuzzy reasoning are non-linear models.

More hybrid and machine learning methods have been introduced in recent years for predicting ERP variations. The rapid expansion in computing power and data volume has made applying deep learning in geodesy increasingly promising. In particular, the long short-term memory (LSTM) network (Hochreiter and Schmidhuber 1997), one of the most popular forms of recurrent neural networks (RNNs), is advantageous for geodetic time series prediction. The LSTM network can capture the non-linear structure between different time epochs in the time series due to the unique structure of its cells (Gers et al. 2000; Graves and Schmidhuber 2005). Some researchers have used the LSTM model in predicting the LOD (Gou et al. 2021), which might also be suitable for PM prediction problems. This study investigates the potential of utilizing LSTM combined with traditional methods for predicting PM. The method proposed is novel in that the non-linear part of PM is not predicted by the linear method AR model but through the deep learning method LSTM model.

This paper is structured as follows: In the second section, we describe the LSTM and LS + AR algorithms. Section three introduces the dataset and processing strategy, including the data used in each experiment, the amplitude variation and characteristics of AW, SAW, and CW in PM through the Fast Fourier Transform (FFT) spectrum analysis, and the detailed PM prediction process by LSTM, LS + AR, and LS + LSTM model. Next, we present different models to estimate PM variability, including LS + AR, LSTM, and LS + LSTM, all of which draw on IERS EOP 14C04 data from 2011 to the end of 2020. At the same time, Bulletin A from the IERS RS/PC is used to compare the prediction accuracy with the results derived in this paper. A summary of the findings is given in the last section.

Materials and methods

LSTM prediction model

Introduction of the general concept of LSTM

LSTM is now widely used and has proven to perform well on various problems such as handwriting recognition, speech recognition, and time series prediction (Schmidhuber 2015; Alex et al. 2018). However, a neural architecture would not be widely utilized in practice without a solid theoretical foundation. Greff et al. (2017) recently reviewed several LSTM variants and their performances relative to the so-called vanilla model (Greff et al. 2017). The variant LSTM is an improved model based on the original LSTM (Hochreiter and Schmidhuber 1997; Gers and Schmidhuber 2000). The main change of the variant LSTM model compared to the original LSTM model is the addition of cell state information to the inputs of the three control gates. Unlike feedforward neural networks, the RNNs have a cycle function, which can take the activation in the previous steps as the network’s input and play a decisive role in the current input. However, training recurrent or very deep neural networks is challenging because they frequently suffer from exploding and vanishing gradient problems (Hochreiter 1991; Hochreiter et al. 2001). To solve the problems mentioned above, the LSTM architecture was developed to address this deficiency and the learning long-term dependencies. Figure 1 depicts the LSTM network structure, which is detailed in Appendix A.

Fig. 1
figure 1

The architecture of LSTM with a forget gate

LSTM training results analysis

In the LSTM network training, the hidden layers are set to 2 and the number of LSTM cells per hidden layer is 50. Time steps are set to 365 and training iterations are set to 1000. The learning efficient dropout is 0.1. The Savitzky–Golay (SG) smoothing filter is used in the experiments of this paper. The initial learning rate is set to 0.1 and the learn rate drop factor is 0.2 (Greff et al. 2017; Ren et al. 2020). The gradient threshold is set to 1 (Din et al. 2019). Other parameter settings are listed in Appendix A, Table 4. Figure 2 shows the LSTM network training based on the PM time series. Figure 2a and b indicates that the correlation between the original and output sequences of PMX and PMY is 0.99982 and 0.99987, respectively. Figure 2c and d shows that the Root Mean Square Error (RMSE) of PMX and PMY is 1.7916 mas and 1.6128 mas, respectively. Figure 2e and f shows that the Mean and the Standard Deviation (STD) of PMX and PMY are − 0.1682 mas and 0.3365 mas, and 1.7840 mas and 1.5776 mas, respectively.

Fig. 2
figure 2

Prediction values of the PMX and PMY. a, b Are the LSTM network training, PMX and PMY raw data in the blue line and outputs in the red line, c, d are the errors, e, f are the error histogram of PM from 2011 to 2020

LS + AR prediction model

LS model

We use the following model to fit the trend and periodic terms of EOP, whose parameters can be estimated using the least squares method. The residuals are then analyzed by the AR and other models. The least squares model can be described as

$$F\left(t\right)=A+Bt+{C}_{1}\mathrm{cos}\left(\frac{2\pi t}{{P}_{SA}} \right)+{C}_{2}\mathrm{sin}\left(\frac{2\pi t}{{P}_{SA}} \right)+{D}_{1}\mathrm{cos}\left(\frac{2\pi t}{{P}_{A}} \right)+{D}_{2}\mathrm{sin}\left(\frac{2\pi t}{{P}_{A}} \right)+{E}_{1}\mathrm{cos}\left(\frac{2\pi t}{{P}_{C}} \right)+{E}_{2}\mathrm{sin}\left(\frac{2\pi t}{{P}_{C}} \right)+\omega ,$$

where A is the constant, B is the trend term parameter in the model, C1 and C2 are the SAW parameters, D1 and D2 are the AW parameters, and E1 and E2 are the CW parameters. The fitting model calculates PSA, PA, and PC in years, representing the SAW, AW, and CW, respectively.

We additionally conducted an FFT analysis of the EOP 14C04 series. From Fig. 3a and b, it can be seen that CW and AW dominate the PM spectrum, manifested by cusps of power between 413 to 439 days (CW) and 356 to 376 days (AW). These values are relatively consistent with estimates given elsewhere (Mccarthy and Luzum 1991; Schuh et al. 2001; Joachim 2004). In our model (Eq. 1), the AW period is 365.25 days, CW is 434 days and SAW is 182.62 days. \(\upomega\) is the random error, \(t\) is the UTC of the series, and the unit is converted into years when LS fitting. Similarly, the meaning of each corresponding parameter table in the PMX series is identical to that of the PMY series.

Fig. 3
figure 3

Amplitude spectrum of the two components of polar motion (PMX in red, PMY in blue) as deduced from FFT of the respective time series. Units are [mas]

AR model

AR(p) model is the description of the relationship between a random series \({z}_{t}\) (t = 1, 2, …, N) before time t and the current time. Its expression can be written as follows:

$${z}_{t}=\sum_{i=1}^{p}{\o }_{i}{z}_{t-i}+{\omega }_{t},$$

where \({\o }_{1, } {\o }_{2, }{\dots , \o }_{p}\) represent the autoregressive coefficients obtained by solving the Yule–Walker equations using the Levinson–Durbin recursion (Brockwell and Davis 1997), \({\omega }_{t}\) is the white noise with zero means, and \(p\) stands for the model order. The above equation denoted by \(\mathrm{AR}\left(p\right)\) is the AR model of the order \(p\), and how to determine the order \(p\) is crucial. Usually, there are three methods for the determination of \(p\), Akaike's final prediction error (FPE) criterion, the information criterion, and the delivery function criterion. In this paper, the FPE criterion is adopted to determine the order \(p\) and corresponds to the smallest FPE (Akaike 1971):

$${P}_{p}=1/(N-p)\sum_{t=p+1}^{N}{\left({z}_{t}-\sum_{j=1}^{p}{\varphi }_{j}{z}_{t-j}\right)}^{2}.$$

The mean absolute error (MAE) is utilized to evaluate the prediction accuracy. It can be expressed as follows:


where \({P}_{i }\) represents the predicted value of \(i\)-th prediction, \({X}_{i}\) stands for the corresponding observation value, \(n\) is the total prediction number, and \({MAE}_{j}\) is the MAE at span \(j\).

Data description and processing strategy

Data description

In particular, we use the PM time series from IERS EOP 14C04 with daily sampling interval is available at In this study, we use the PM series from January 8, 2011 to December 31, 2021. The results will be compared to Bulletin A (558 files) available at, for the same period of time as IERS EOP 14C04. The LSTM network training is based on the PM time series from January 1, 2011 to December 31, 2020.

PM prediction processing strategy

Figure 4 depicts a schematic representation of the methodology adopted for predicting PM with various models. The observed PM can be divided into deterministic and stochastic components. The known component is referred to as a priori model, consisting of the long-term trend, CW, AW, and SAW. In this study, the LS + AR model (first method) is applied to forecast PMX and PMY and compared to results based on the LS + LSTM (second method) and LSTM models (third method). Figure 4 describes the respective processing schemes. For the first two methods, training patterns are derived from the residuals after subtracting the a priori model. These patterns are used for training the LSTM. The predicted residuals are then added to the a priori model to obtain the final predicted values of the PMX and PMY. The third method for predicting PM uses the LSTM model directly, relying on the IERS EOP14C04 time series.

Fig. 4
figure 4

Flowchart of the LS + AR, LS + LSTM, and LSTM approaches for PM prediction

Results and discussion

PM prediction using the LS + AR model

We initially preprocess the PM series using the LS model and deduce residuals (that is, the stochastic components) by subtracting the LS analysis results from the original pole coordinates. Figure 5a shows the PM residual results (purple line) derived from the IERS EOP 14C04 from 2011 to 2020. The residuals of the PMX and PMY are within \(\pm 0.08\) arcsecond (as). Due to the nature of the LS fitting model, the fluctuations at the start and end of the time series are somewhat larger compared to the middle part. Figure 5b depicts the first-order difference of the residual sequence (brown line) for PMX and PMY. Most of the residual values and the first difference values are within \(\pm 0.1\) arcseconds (as) and \(\pm 2.0\) milliarcseconds (mas), respectively.

Fig. 5
figure 5

Residual series (purple line) of the LS fitting model and the first-order difference series (brown line) of the residuals from 2011 to 2020

The evaluation of the Autoregressive Integrated Moving Average (ARIMA) model type is essential. When \(p=0\), the \(\mathrm{ARIMA}(p,q)\) (autoregressive integrated moving average) model can be expressed as \(\mathrm{MA}(q)\), i.e., q order moving average model. When \(q=0\), the model can be described as \(\mathrm{AR}(p)\), i.e., a p-order autoregressive model (Box et al. 1976). For the PM time series, Table 1 illustrates the criteria for which model can be evaluated according to the autocorrelation and partial correlation functions of the time series. Figure 6 depicts the subsequent calculation of the autocorrelation function (ACF) and partial autocorrelation function (PACF) for the first-order differential of residuals time series with a delay of 1 to 40 days. The results indicate that the autocorrelation function of the first-order difference sequence of the residual error is tailing, and the partial correlation function is truncated, allowing the \(\mathrm{AR}(p)\) model to be used for prediction, i.e., \(q=0\) (Schaffer et al. 2021). In this research, the FPE is used to determine order p, and this method is described in Eqs. (3) and (4). The optimal order p for the AR model, as determined by the final prediction error criterion, is set to 50.

Table 1 Judgment criterion of the ARIMA model
Fig. 6
figure 6

The ACF and PACF of first-order differential of PM residuals time series for lags from 1 to 40 days. All results in units of [mas]

In this experiment, we extrapolate the deterministic part of 2021 (365 days) using the LS model, based on the least squares fitting series of the IERS EOP14C04 from January 1, 2011 to December 31, 2020 (10 years). Figure 7 shows the LS model time series (red line) and LS extrapolation time series (blue line) of PM. The experimental purpose is to use the LS + AR model for PM prediction. The final prediction results of the LS + AR model are the sum of the LS extrapolation using the determined part and the prediction results of the AR model based on the residual sequence.

Fig. 7
figure 7

The LS fitting results of the PMX and PMY (red line) between 2011 and 2020, and the LS extrapolation of PMX and PMY (blue line) in the future 365 days of 2021. All outcomes are given in units of [as]

PM prediction using the LSTM model

Figure 8 displays the PM prediction based on the IERS EOP14C04 time series (blue line) utilizing the LSTM model. The green line represents the prediction outcomes for 2021 (365 days). It can be seen that the prediction results of PMX and PMY based on the LSTM model for 365 days in 2021 are consistent with the overall trend of PMX and PMY time series from the IERS EOP 14C04. Most PMX prediction values fall between 0.15 as and 0.21 as and the PMY prediction values are between 0.28 as and 0.42 as.

Fig. 8
figure 8

The PM prediction results (green line) from the LSTM model in 2021, based on the PMX and PMY observation (blue line) from IERS EOP14C04 between 2011 and 2020. All results in units of [as]

PM prediction using the LS + LSTM model

To investigate the contribution of the LS + LSTM model in PM prediction, LSTM is applied to forecast the residual part in 2021, using the residuals' basic time series. Figure 9 presents the 2021 residual values (blue line) prediction from the LSTM model. The final PM prediction using the LS + LSTM model is the total results of the LS extrapolation from the LS model and the residuals determined by the LSTM model.

Fig. 9
figure 9

Residuals of PM (red line) from 2011 to 2020, and the residual prediction results (blue line) of the PMX and PMY based on the LSTM model in 2021. All results are given in units [as]

Figure 10a and b depicts the final prediction of the PM with different methods, including the LS + AR, LSTM, and LS + LSTM models. In addition, the forecast results of Bulletin A in 2021 are included (purple line). The IERS EOP14C04 time series is considered a benchmark for comparing the estimated outcomes of various techniques. In terms of PM prediction, the results predicted by the LS + LSTM model (green line) in Fig. 10c and d are the closest to the IERS EOP14C04 time series (red) over the mid- and long-term prediction. Although the improvement is marginal, the findings predicted by the LS + LSTM model are very close to or better than Bulletin A in the mid- and long-term prediction of PMX, and it also can be seen that the PMX prediction accuracy from the LSTM model is higher than that from the LS + AR model in the mid- and long term. For PMX, the RMSE of the results is 0.035 as, 0.031 as, 0.018 as, and 0.030 as for LS + AR, LSTM, LS + LSTM, and Bulletin A, respectively. For PMY, the RMS of the results is 0.038 as, 0.035 as, 0.015 as, and 0.035 as for LS + AR, LSTM, LS + LSTM, and Bulletin A, respectively.

Fig. 10
figure 10

The PMX and PMY predictions from different models, i.e., LS + AR, LSTM, LS + LSTM, and Bulletin A model, are shown in a and b panel, respectively, and the IERS EOP 14C04 as a reference. Panels c and d show the 2021 forecasts as separate figures. All results in units of [as]

Evaluating the PM prediction results

Based on the previous analysis (Fig. 10), the LS + LSTM model prediction accuracy is higher than other models. To assess the accuracy of this method in a more extended way, we compare the PM prediction in different periods using the LS + LSTM model, LS + AR model, and LSTM model as shown in Fig. 11. The prediction span is 365 days with a ten-year basic sequence, and the statistical period is from 2011 to 2020. The model proposed in this experiment is based on the IERS EOP 14C04 for PM prediction. In Fig. 11 the orange, brown, green, and purple lines represent the LS + AR, LSTM, LS + LSTM, and Bulletin A prediction of PM, respectively. We also compare the PM prediction results from the LS + LSTM model to the IERS EOP 14C04. In the mid- and long-term prediction of the PM, the prediction results based on the LS + LSTM model are closer to the observed IERS EOP 14C04 time series than those based on Bulletin A.

Fig. 11
figure 11

Prediction values of PMX and PMY with different models, including the LS + AR, LSTM, and LS + LSTM, respectively. All results in units of [as]. IERS EOP 14C04 from 2011 to 2020 is shown as reference for comparisons

AE of PM prediction with different models

Experiments have demonstrated that the LS + LSTM model is superior for predicting PM, especially in the mid- and long term. To further explore the advantages of the LS + LSTM model in the accuracy of PM prediction, four different cases were designed to predict PM for 11 years (from 2011 to 2021). In this experiment, the prediction span was 365 days with a weekly sliding window. The experiment is divided into four parts, considering the following methods:

Case 1: PMX and PMY prediction based on the LS + AR model;

Case 2: PMX and PMY prediction based on the LSTM model;

Case 3: PMX and PMY prediction based on the LS + LSTM model;

Case 4: PMX and PMY prediction from Bulletin A achieved from the IERS RS/PC.

The four cases listed above correspond to the LS + AR, the LSTM, the LS + LSTM, and the Bulletin A provided by the IERS, respectively. Authors generally rely on the 10-year IERS EOP14C04 time series in the PM prediction as the basic series (Xu et al. 2012; Xu and Zhou 2015; Kenyon et al. 2012). In the following experiments, using various methods, we also choose a ten-year base sequence to predict the PM for the next 365 days.

Figure 12 shows the PM prediction’s absolute errors (AE) using four cases. All experimental results take the IERS EOP14C04 time series as a reference. It can be seen that the accuracy of the four cases (LS + LSTM, LSTM, LS + LSTM, and Bulletin A) in 2011–2015 is inferior to that in 2016–2021. One potential reason could be the 2011 earthquake on the Pacific coast of Tōhoku (the 3.11 Japan earthquake). Based on the data from the Jet Propulsion Laboratory (JPL) of the National Aeronautics and Space Administration (NASA), the 3.11 Japan earthquake shifted the Earth's rotation axis by 25 cm and accelerated the Earth's rotation rate by 1.8 microseconds (Gross 2007). Earthquakes not only cause significant changes in the Earth's rotation on the day they occur, but they also impact the location changes of surface stations over the next 3–5 years, thus affecting ERP monitoring (Souriau 1986; Bizouard, 2005; Bogusz et al. 2015). IERS introduced post-seismic deformation (PSD) in 2017 when establishing the most recent international terrestrial coordinate framework (ITRF2014) to reduce the influence of earthquakes on ground stations and obtain more accurate ERPs data. It is worth noting that ITRF2014 is the most recent ITRF solution at the time of the study. The IERS EOP14C04 also corrected the PSD model of the large earthquake in Japan in March 2011 to precisely solve the PM change phenomenon at this stage; hence, the PM results at this phase deviated from the previous comprehensive trend. However, this deviation was not considered when the models described in this study were used to predict PM, likely resulting in prediction errors. Thus, our preliminary conclusion is that the larger deviations of the prediction results from the observed values (EOP 14C04) between 2011 and 2015 are attributable to the effects of large earthquakes. To improve the accuracy of PM prediction following a major earthquake, further PSD model processing of the prediction algorithm is required. The results predicted by the LS + LSTM are closer to the IERS EOP14C04 series than those by the LS + AR, LSTM, and Bulletin A in mid- and long-term prediction.

Fig. 12
figure 12

Absolute errors (AE) of the PMX and PMY prediction from 2011 to 2021, and all outcomes with the unit [as]. All results are based on 10-year time series of IERS EOP 14C04, using the LS + AR, LSTM, and LS + LSTM models, respectively. The MAE of Bulletin A relies on the PM prediction, coming from the IERS RS/PC

MAE of PM prediction with different models

Figure 13 shows the Mean absolute errors (MAE) of PMX and PMY prediction in four cases. Compared to the other models, the MAE of the proposed LS + LSTM model yields smaller errors in mid- and long-term prediction. Since the LS + LSTM model better considers the overall characteristics of the base series, it obtains a more accurate long-term trend and long period term than the LS model during extrapolation, thus improving the mid- and long-term PM prediction accuracy.

Fig. 13
figure 13

Mean absolute errors (MAE) of the PMX and PMY prediction using the four cases, namely, the LS + AR model (blue line), LSTM model (brown line), LS + LSTM model (red line), and Bulletin A is indicated by the green line. All results in units of [mas]

As Table 2 reveals, the estimation accuracy of the PM is determined using different models, i.e., the LS + AR, LSTM, and LS + LSTM models. The MAE of predicted PM at various periods (1, 5, 10, 15, 20, 30, 45, 90, 120, 180, 270, 320, 365 days) is listed in Table 2. Combined with the PM prediction accuracy statistics, the improvement of LS + LSTM over Bulletin A is clear after 120 days. The improvement gradually increases with the lengthening of the prediction span, reaching a maximum of 33.7% and 31.9% in PMX and PMY, respectively. Generally, the LS + LSTM model has more advantages than the LSTM model, the traditional linear prediction model (LS + AR model), and Bulletin A in mid- and long-term PM prediction.

Table 2 Mean absolute errors (MAE) of the polar motion (PMX, PMY) prediction [mas] in 2021, using the LS + AR, LSTM, LS + LSTM, and Bulletin A time series from IERS, respectively

However, Bulletin A exhibits a smaller MAE in the four cases for short-term prediction. Table 2 demonstrates that Bulletin A outperformed the other models regarding short-term prediction, especially the ultra-short term (the first ten days in the future). This advantage is primarily due to Bulletin A considering the effects of atmospheric angular momentum (AAM) and oceanic angular momentum (OAM). In addition, the statistical results demonstrate that the LSTM model is superior to the traditional model (LS + AR) in the long-term (270–365 days) prediction of PM.


Polar motion is a crucial parameter describing the instantaneous movement of the Earth’s rotation axis relative to the body-fixed reference frame. Among existing prediction models, linear models are often used to predict PM, such as the LS + AR model. Here, we have analyzed the PM series data from January 8, 2011 to September 11, 2021 by different models, including LS + AR, LSTM, LS + LSTM, and Bulletin A. The residual series used in this research is obtained by removing the long-term trend term and the calculated AW, SAW, and CW values. In this paper, based on the characteristics of PM and its inherent periodic and trend terms, the LSTM prediction model is proposed. To verify the advantages of LSTM and its combination with LS in PM prediction, the basic sequence length of 10 years is selected, which is optimal for the LS + AR model to predict PM. The experimental findings demonstrated that the LS + LSTM model is superior for mid- and long-term forecasting of PM. Compared to Bulletin A, published by IERS, the LS + LSTM model demonstrates improved PMX and PMY prediction accuracy by up to 33.7% and 31.9%, respectively, and the LSTM model outperforms the LS + AR model in the mid- and long term.

The study's findings rely heavily on the 10-year snippet of PM time series between 2011 and 2021. Future research will investigate the relationship between the length of the basic time series, the seismic factors, and the accuracy of LSTM and LS + LSTM models in predicting PM. The prediction model such as LS + LSTM, based on proper base sequence length and seismic factor correction, will be established to improve the short-term PM prediction. In addition, the benefits of combining the LSTM with LS and other traditional methods for PM short-term prediction need to be further explored.

Availability of data and materials

The datasets of IERS EOP 14C04 with a daily interval are available at; the Bulletin A can find at



Annual wobbles


Autoregressive models


Autoregressive integrated moving average


Autocorrelation function


Absolute errors


Atmospheric angular momentum


Chandler wobbles


Earth orientation parameters


Earth rotation parameters


Fast Fourier Transform


Final prediction error


Global Position System


International Earth Rotation and Reference Systems Service


Institute of Applied Astronomy


Jet Propulsion Laboratory


Length of Day


Least squares


Long short-term memory


Mean absolute errors


National Aeronautics and Space Administration


Oceanic Angular Momentum


Recurrent neural networks


Polar motion


X coordination of PM


Y coordination of PM


Autocorrelation function


Post-seismic deformation


Rapid Service/Prediction Center


Russian Academy of Sciences




Satellite Laser Ranging


Semi-Annual wobbles


Universal Time


Coordinated Universal Time


The US Navy Observatory


Very Long Baseline Interferometry


Download references


This study is under the support of the National Natural Science Foundation of China (NSFC) (Grant Nos. 41874035, 12103035, 42030105). The authors are grateful to IERS for the EOP 14C04 solution. It is worth stating that all the prediction models in this experiment were implemented based on our self-developed software compiled in the MATLAB platform. Interested readers may contact the authors by email.


Funding was provided by the National Natural Science Foundation of China (NSFC) (Grant Nos. 41874035, 12103035, 42030105).

Author information

Authors and Affiliations



CW designed the framework of this research in this paper. CW and PZ designed and derived the theoretical formulations, carried out the observations of the microtremors, and wrote the article. CW performed the numerical calculations. All the authors read and approved the final manuscript.

Corresponding author

Correspondence to Chenxiang Wang.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.



Description of the LSTM training process

LSTM adopts three unique gate designs to avoid gradient explosion and long-term dependence (Yu et al. 2019). Since each cycle uses information from the previous cycle and each output state is affected by the previous state, the LSTM network can better remember long-term laws more effectively and is widely used in time series prediction, such as financial time series prediction (Zhang et al. 2019), GNSS time series prediction (Wang et al. 2021), and weather forecasting (Karevan and Suykens 2020). A vanilla LSTM unit contains a cell, an input gate, an output gate, and a forget gate. This forget gate was not initially a part of the LSTM network but was proposed by Gers et al. (2000). Figure 1 depicts the LSTM network structure adopted in this paper. At time t, the first layer network comprises two information flows. The information flow from \({C}_{t-1}\) to \({C}_{t}\) represents the transmission of cell state. The entire line linearly interacts with the following information flow through three gate control structures.

The gate structure allows information to selectively pass through, i.e., determining whether the information is removed or added to the cell state from \({C}_{t-1}\) to \({C}_{t}\). That is to say, this part is the screening of information input in the gating structure. The \(\upsigma\) activation function layer and the tanh activation function layer can convert the input between (0, 1) and (− 1, 1), generate the weight of the input data, and filter the input data. There are three gate structures in each layer of the LSTM network to control the cell state:

(1) Forget gate

The first step in the LSTM is to decide what information will be removed from the cell state. This decision is made by the \(\upsigma\)-layer called the "forget gate layer." It looks at \({h}_{t-1}\) and \({x}_{t}\) and outputs a number between 0 and 1 for each number in the cell state \({C}_{t-1}\). 1 is "completely keep \({h}_{t-1}\) and \({x}_{t}\) information," while the 0 represents "completely get rid of \({h}_{t-1}\) and \({x}_{t}\) information." The formula of the "forget gate"\({f}_{t}\) is as follows:

$${f}_{t }=\sigma \left({W}_{f} .\left[{h}_{t-1}, {x}_{t} \right]+{b}_{f} \right),$$

where \(\upsigma\) is the activation function, \({W}_{f}\) is the weight, \({h}_{t-1}\) is the recurrent information at time t-1, \({x}_{t}\) denote the input information, and \({b}_{f}\) is the bias of the forget gate.

(2) Input gate

One part of the "input gate" linearly combines \({x}_{t}\) with the hidden state \({h}_{t-1}\) at the previous time to obtain \({i}_{t}\) through \(\upsigma\)-layer activation from Eq. (7) (Wang et al. 2021). This part determines which information needs to be updated; this is part of the forgetting gate selected to be forgotten. In the other part, \({h}_{t-1}\) and \({x}_{t}\) are passed through a tanh layer to generate a vector \(\widetilde{{C}_{t}}\), which is alternatively employed to update the new content. Then the two parts are combined to update the state \({C}_{t-1}\) to \({C}_{t}\). The expression of the "input gate" is as follows (Hochreiter and Schmidhuber 1997; Wang et al. 2021):

$${i}_{t }=\sigma \left({W}_{i} .\left[{h}_{t-1}, {x}_{t} \right]+{b}_{i} \right),$$
$$\widetilde{{C}_{t}}=tanh\left[{W}_{C }\left( {h}_{t-1} ,{x}_{t}\right)+ {b}_{C}\right],$$

cell state equation at time t,


where \({C}_{t}\) denotes the cell state of LSTM, tanh is the activation function, \({W}_{i }, {W}_{C}\) are the weights, * represents convolution, \({b}_{i }, {b}_{c}\) are the bias of the input gate,\({\widetilde{C}}_{t}\) is the cell status update value, and other parameters are consistent with those mentioned above.

(3) Output gate

The "output gate" updates the value of the hidden layer output at the current time, i.e., \({h}_{t}\), through Eq. (11), is the hidden state at time t. LSTM can remember long-term historical information because every cycle uses the information \({C}_{t}\) and \({h}_{t-1}\) of the previous cycle, and each output state is affected by the previous state. Especially, the forget gate can decide what information will be removed from the cell state. The formula of the "output gate" is as follows:

$${o}_{t }=\sigma \left({W}_{o} .\left[{h}_{t-1}, {x}_{t} \right]+{b}_{o}\right),$$

where \({o}_{t}\) is the output gate, \({W}_{o}\) is the weight associated with \({x}_{t}\), \({b}_{o}\) is for the bias weight vector, and \({h}_{t}\) is the hidden state. Equations (6)–(10), \({W}_{\zeta }\),\({b}_{\xi }\), \(\upzeta \in \left\{f,i,c,o\right\}\),\(\upxi \in \left\{f,i,c,o\right\}\), respectively, represent the output weight and offset matrix, which are also parameters to be learned in training.

In network training, one should pay attention to possible overfitting. It can minimize the loss function by constantly adjusting the parameters. For example, when the total number of samples is N, the output value \({Y}_{i}^{*}\) trained by the network and the expected output value \({Y}_{i}\) can be expressed by the mean squared error (MSE) loss function, also known as the L2 Loss function. Its basic form is as follows:


LSTM Model training is the process of determining the parameters in the model structure. First, the original PM time series defines as \({F}_{o}= \left\{{f}_{1}, {f}_{2},\dots ,{f}_{n}\right\}\) in the input layer. The training set and test set can be divided into \({F}_{tr}= \left\{{f}_{1}, {f}_{2},\dots ,{f}_{m}\right\}\) and \({F}_{te}= \left\{{f}_{m+1}, {f}_{m+2},\dots ,{f}_{m}\right\}\), satisfying the constraint conditions \(m<\) n, and \(m, n \in \mathrm{N}\). Then standardize the element \({f}_{t}\) in the training set using the classic z-score standardization formula as Eq. (13) (Liu et al. 2019). The \({x}{\prime}(t)\) represents the PM values at t, \({x}_{\mathrm{mean}}{\prime}\) and \({x}_{\mathrm{std}}{\prime}\) are the mean and standard deviation of \({x}{\prime}(t)\), respectively. The standardized training set can be expressed as

$${F}_{tr}{\prime}=\left\{\begin{array}{ccc}\begin{array}{cc}{f}_{1}{\prime},& f\end{array},& \cdots & {f}_{m}{\prime}\end{array}\right\},$$
$$f_{t}^{\prime}=\left( {{f}_{t}}-\underset{t=1}{\overset{n}{\mathop \sum }}\,\frac{{{f}_{t}}}{n} \right)/\sqrt{\frac{\mathop{\sum }_{t=1}^{n}{{\left( {{f}_{t}}-\mathop{\sum }_{t=1}^{n}{}^{{{f}_{t}}}/{}_{n} \right)}^{2}}}{n}}\quad 1\le t\le m,t \in \text{N}$$

In order to adapt to the characteristics of hidden layer input, a data segmentation method is applied to \({F}_{tr}{\prime}\). Set the split window length value to \(L\), and the model input after the split is

$$X= \left\{{X}_{1}, {X}_{2}, \dots ,{X}_{L}\right\},$$
$${X}_{p}=\left\{{f}_{p}{\prime},{f}_{p+1}{\prime}, \dots ,{f}_{m-L+p-1}{\prime}\right\} \quad 1\le p\le L;\, p,L\in \mathrm{N}.$$

The corresponding theoretical output is

$$\mathrm{Y}= \left\{{Y}_{1}, {Y}_{2}, \dots ,{Y}_{L}\right\},$$
$${\mathrm{Y}}_{p}= \left\{{f}_{p+1}^{\prime}, {f}_{p+2}^{\prime}, \dots ,{f}_{m-L+p}^{\prime}\right\}.$$

Next, input the \(X\) to the hidden layers, which contain \(L\) isomorphic LSTM cells connected at the front and back times. The output of \(X\) after passing through the hidden layer is represented as

$$P= \left\{{P}_{1}, {P}_{2}, \dots ,{P}_{L}\right\},$$
$${P}_{p}={\mathrm{LSTM}}_{\mathrm{forward}}\left({X}_{p}, {C}_{p-1},{H}_{p-1}\right),$$

where \({C}_{p-1}\) and \({H}_{p-1}\) represent the state and output of the previous LSTM cell, respectively; \({LSTM}_{\mathrm{forward}}\) represents the forward calculation method of LSTM cells. If the cell state vector size is set to \({S}_{\mathrm{state}}\), the sizes of both \({C}_{p-1}\) and \({H}_{p-1}\) vectors are both \({S}_{\mathrm{state}}\). It can be seen that the hidden layer output \(P\), model input \(X\), and theoretical output \(Y\) are two-dimensional arrays with dimensions of (m-L, L). The mean square error is selected as the error calculation formula, and the loss function of the training process can be expressed as


Application of the trained LSTM model

Set the minimum loss function as the optimization goal, and given the random seed for network initialization, learning efficiency is η, and training steps, apply the Adam optimization algorithm to continuously update the network weight to obtain the final hidden side network.

This section applies the trained LSTM network (\({\mathrm{LSTM}}_{\mathrm{net}}^{*}\)) for prediction, and the prediction process adopts an iterative method. First, the last row of data in theoretical output \(Y\) is

$${Y}_{f}=\left\{\begin{array}{ccc}\begin{array}{cc}{f}_{m-L+1}^{\prime},& {f}_{m-L+2}^{\prime}\end{array},&\ldots & {f}_{m}^{\prime}\end{array}\right\}.$$

Enter \({Y}_{f}\) into \({\mathrm{LSTM}}_{\mathrm{net}}^{*}\), the output result can be expressed as

$${{P}_{f}=\mathrm{LSTM}}_{\mathrm{net}}^{*}\left({Y}_{f}\right)=\left({p}_{m-L+2 ,} {p}_{m-L+3 ,}\dots {p}_{m+1}\right).$$

The predicted value at time \(m+1\) is \({p}_{m+1}\). Then, combine the last \(L-1\) data point of \({Y}_{f}\) and \({P}_{m+1}\) into a new row of data

$${Y}_{f}=\left\{\begin{array}{ccc}\begin{array}{cc}{f}_{m-L+1}^{\prime},& {f}_{m-L+2}^{\prime}\end{array},& \ldots & {f}_{m}^{\prime}\end{array}\right\}.$$

Enter \({Y}_{f+1}\) into \({\mathrm{LSTM}}_{\mathrm{net}}^{*}\); then the predicted value at time \(m+2\) is \({\mathrm{P}}_{m+2}\), and so on. The resulting prediction order is

$${P}_{o}= \left\{{P}_{m+1}, {P}_{m+2}, \dots ,{P}_{n}\right\}.$$

Next, by performing z-score de-normalization on \({P}_{o}\)(represented as de_zscore), and the formula of \({P}_{te}\) is the de-normalization as shown in Eq. (27) (Liu et al. 2019), the final prediction sequence corresponding to the test set \({F}_{te}\) is obtained as.

$${P}_{te= }{P}_{o}*{x}_{\mathrm{std}}^{\prime}+{x}_{\mathrm{mean}}{\prime},$$
$${P}_{te}=d{e}_{zscore\left({P}_{o}\right)}=\left({p}_{m+1,}^{*}{p}_{m+2,}^{*}\dots ,{p}_{m+n}^{*}\right),$$
$${P}_{k}^{*}= \sqrt{\frac{\sum_{t=1}^{n}{({f}_{t}-\sum_{t=1}^{n}\frac{{f}_{t}}{n})}^{2}}{n}}+\sum_{t=1}^{n}\frac{{f}_{t}}{n},m+1\le k \le n, k\in \mathbf{N}.$$

Similarly, using each row of \(X\) as model input can obtain a fitting sequence \({P}_{tr}\) corresponding to the training set \({F}_{\mathrm{tr}}\). The training of LSTM-based PM time series models, prediction algorithms, and parameter optimization of LSTM prediction models are shown in Table

Table 3 Based on LSTM PM time series models training and prediction algorithm

3, and other parameter settings are shown in Table

Table 4 Experiments parameters


Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, C., Zhang, P. Improving the accuracy of polar motion prediction using a hybrid least squares and long short-term memory model. Earth Planets Space 75, 153 (2023).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: