Skip to main content

Equatorial spread-F forecasting model with local factors using the long short-term memory network

Abstract

The predictability of the nighttime equatorial spread-F (ESF) occurrences is essential to the ionospheric disturbance warning system. In this work, we propose ESF forecasting models using two deep learning techniques: artificial neural network (ANN) and long short-term memory (LSTM). The ANN and LSTM models are trained with the ionogram data from equinoctial months in 2008 to 2018 at Chumphon station (CPN), Thailand near the magnetic equator, where the ESF onset typically occurs, and they are tested with the ionogram data from 2019. These models are trained especially with new local input parameters such as vertical drift velocity of the F-layer height (Vd) and atmospheric gravity waves (AGW) collected at CPN station together with global parameters of solar and geomagnetic activity. We analyze the ESF forecasting models in terms of monthly probability, daily probability and occurrence, and diurnal predictions. The proposed LSTM model can achieve the 85.4% accuracy when the local parameters: Vd and AGW are utilized. The LSTM model outperforms the ANN, particularly in February, March, April, and October. The results show that the AGW parameter plays a significant role in improvements of the LSTM model during post-midnight. When compared to the IRI-2016 model, the proposed LSTM model can provide lower discrepancies from observational data.

Graphical Abstract

Introduction

The equatorial spread-F (ESF) is a nighttime ionospheric irregularity near the magnetic equatorial region. ESF is observed on ionogram images from the Frequency Modulated Continuous Wave (FMCW) ionosonde, Abdu et al. (1981). The appearance of ESF is represented by the spreading of the ionogram trace along height and frequency axes on the ionogram image, indicating irregularities in the F-layer bottom-side. Generation of the ESF is observed after post-sunset due to plasma instabilities, which is explained through the Rayleigh–Taylor instability, Woodman and La Hoz (1976). The ESF generation depends on precursor conditions such as the evening prereversal enhancement in the vertical plasma drift (PRE), the F-layer bottom-side density gradient, seeding perturbations, and wave structures in the plasma density and initiated polarization electric field, Abdu (2019). The ESF characteristics are basically understood and described through numerous parameters. Therefore, this fundamental knowledge can contribute to an effective development of the ESF forecasting model.

The generation and development of ESF phenomena are triggered by the large-scale wave structures (LSWS) in F-layer heights and together with the PRE vertical drift during the afternoon until post-sunset hours, Abdu et al. (2015). In some cases, the ESF occurrence rate can approach 100% if the vertical plasma drift velocity is higher than 40 m/s, Abadi et al. (2020). The study of Tulasi et al. (2017) also reports that increased drift velocities of the post-sunset (post-midnight) at around 45–256 m/s (26–128 m/s) can cause the ionospheric plasma irregularity. Additionally, atmospheric gravitational waves (AGWs) play a significant role on the development of the seed plasma perturbations from AGW-driven neutral wind perturbations. Also, the study of Tsunoda (2010) emphasizes that the seeding perturbations are crucial in the development of ESF occurrences. The amplitude of the the seed perturbations with F-layer height variations plays a significant role in the ESF occurrence or nonoccurrence, Manju et al. (2016). The latitudinal expansion of ESF/equatorial plasma bubble (EPB) occurrences is found due to the F-layer height bottom-side changes (Saito and Maruyama 2006; Rungraengwajiake et al. 2013). Also, the ESF characteristics over longitudinally close stations are not necessarily the same due to their local conditions, Thammavongsy et al. (2022). The high ESF occurrence rate is observed in the high solar activity and near the magnetic equatorial region. The high and low probabilities of the ESF occurrences are noticed in equinoctial and solstice months, respectively, Klinngam et al. (2015). In contrast, the suppression and time delay (3–9 h) of the ESF commencement can be caused by high magnetic activities, Li et al. (2009). Several evidences are investigated under boundary of all possible local and global conditions. However, the local conditions are uniquely crucial and necessary to extend the understandability and predictability of the ionospheric irregularity.

The climatological characteristics of the ESF occurrence are well in terms of controlling factors and physical mechanisms for longitudinal variations, seasonal variations, and solar activity. However, the day-to-day and short-term variabilities in the ESF occurrence are still difficult to be accurately predicted with the long-term controlling factors, Li et al. (2021). The efforts of developing the ESF forecasting model have been attempted in space weather studies. The development of the forecasting model on long-term variability of the ESF occurrence is designed over large longitudinal areas, for example, the monthly probability of the ESF occurrence can be successfully modeled using the cubic B-spline method, Abdu et al. (2003), the ESF forecasting models are also developed using the neural networks over Brazil and Thailand (McKinnell et al. 2010; Thammavongsy et al. 2020), thresholding determined by the hʹF and S4 scintillation can be used to forecast the ESF events in Peruvian and Indian sectors (Anderson and Redmon 2017; Aswathy and Manju 2018), and the post-sunset ESF prediction model is accomplished using the logistic regression in Southeast Asia, Abadi et al. (2022). These studies exhibit the development of methods for ESF forecasting models and they discussed the important role of the space weather parameters such as diurnal, seasonal, solar indices, and magnetic indices. In contrast, the utilization of local parameters with machine learning is not considered. Then this might be an important key to improve the ESF forecasting model.

Recently, the artificial intelligence (AI) is widely applied in space weather forecasting models. In particular, deep learning networks are used to solve complex problems. One of the most powerful deep neural networks for the time series data is a long short-term memory (LSTM) network, (Hochreiter and Schmidhuber 1997; Liu et al. 2020; Tan et al. 2018). In space weather studies, the LSTM model is successfully applied in the global and mid-latitude TEC forecasting, foF2 and hmF2 forecasting models for both quiet and disturbed geomagnetic storms, geomagnetic Kp index forecasting, and SYM-H and ASY-H forecasting (Liu et al. 2020; Ulukavak 2020; Kim et al. 2020; Tan et al. 2018; Collado-Villaverde et al. 2021). Therefore, the multi-timesteps/loopbacks and advanced functionalities of the LSTM model are highly expected in improving the ESF forecasting model. The relationship between global and local conditions, and the ESF generation and development are well investigated in the literatures. To achieve better accuracy of long-term and short-term ESF prediction, the investigation of the new characteristic inputs is still needed for developing the ESF forecasting model based on prior knowledge.

In this work, we develop ESF forecasting models using Deep Learning techniques: artificial neural network (ANN) and long short-term memory (LSTM) for Chumphon (CPN) station, Thailand. The new local input parameters including the virtual height of F-layer (hʹF), F-layer drift velocity of the hʹF (Vd), and the atmospheric gravity waves (AGW) are considered. The efficiency between the ANN and LSTM models is compared in this study. In addition, the IRI-2016 model is validated with the observations. From the results, the predictive outputs are evaluated in three-dimensional analyses consisting of monthly probability, daily probability and occurrence, and diurnal predictions.

Data and methods

Description of input parameters

The input parameters of the ESF forecasting model in this study include the daily solar activity (F10.7 and SSN) downloaded from the Space Physics Data Facility (SPDF) OMNIWeb database at https://omniweb.gsfc.nasa.gov/form/dx1.html, the 3-hourly and daily averaged magnetic activity indices (ap3 and Ap; kp3 and Kp) from World Data Center for Geomagnetism, Kyoto University at https://wdc.kugi.kyoto-u.ac.jp/index.html, and the local hʹF parameter, F-layer drift velocity (Vd), and atmospheric gravity waves (AGW). The last three input parameters are gained by manually scaling the ionogram, differentiating the hʹF against the time, and analyzing wavelet transform, respectively. In addition, the diurnal and seasonal variations are represented by hour number (Hn) and day number (Dn), which are converted using the sine and cosine functions for the continuity in hour and day numbers as the following:

$$\mathrm{Ts}=\mathrm{sin}\left(\frac{2\uppi \times \mathrm{Hn}}{24}\right),$$
(1)
$$\mathrm{Tc}=\mathrm{cos}\left(\frac{2\uppi \times \mathrm{Hn}}{24}\right),$$
(2)
$$\mathrm{Ds}=\mathrm{sin}\left(\frac{2\uppi \times \mathrm{Dn}}{365.25}\right),$$
(3)
$$\mathrm{Dc}=\mathrm{cos}\left(\frac{2\uppi \times \mathrm{Dn}}{365.25}\right),$$
(4)

where 24 is the total number of hours and 356.25 is used due to the included leap year in the data set, Watthanasangmechai et al. (2012).

ANN and LSTM algorithms

The ANN has been successfully deployed on time series data, Zhang (2012). One of the powerful ANNs is the LSTM network that can fulfill the short and long recognitive terms on the time series data. The LSTM model is mainly designed to mitigate the vanishing gradient problem existing in the Recurrent Neural Network (RNN) and extend the ability of the model memorization (Hochreiter and Schmidhuber 1997; Alex Graves 2012). Then this leads to increments of the LSTM model learnability for both short and long terms on the time series data. The most significant components in the LSTM model structure are proposed including the cell state, input gate, forget gate, aggregated gate, and output gate as expressed in Eqs. (9)–(13). As shown in this study, the standard LSTM model with many inputs and single output is mainly used. The ANN and LSTM models are mathematically shortly expressed as the following:

The final output \({\widehat{y}}_{t}\) of the ANN at time tth is obtained by

$${\widehat{y}}_{t}=\sigma \left({\mathbf{y}}_{t}^{[l]}\cdot {\mathbf{W}}_{hy}+{\mathbf{b}}_{y}\right),$$
(5)

where \(\sigma\) can be any activation functions such as hyperbolic tangent, rectified linear unit (ReLU), softmax, etc., \(l\) represents the layer number and \({\mathbf{W}}_{\mathrm{hy}}\) and \({\mathbf{b}}_{y}\) are weight and bias vectors representing the connections between hidden and output layers. The current output signal depends on the output of the previous hidden layer as the following:

$${\mathbf{y}}_{t}^{[l]}=\sigma \left({\mathbf{x}}_{t}\cdot {\mathbf{W}}_{xh}+{\mathbf{b}}_{h}\right),$$
(6)

where \({\mathbf{W}}_{xh}\) is the weight connections between the input and the hidden layers, and \(\mathbf{x}\in {R}^{1\times d}\) are the input vector to the network and \(d\) indicates number of the input features.

For the LSTM model, the final output \({\widehat{y}}_{t}\) is computed depending on the hidden state as the following:

$${\widehat{y}}_{t}=\sigma \left({\mathbf{h}}_{t}\cdot {\mathbf{W}}_{hy}+{\mathbf{b}}_{y}\right),$$
(7)
$${\mathbf{h}}_{t}={\mathbf{o}}_{t}8 \odot \mathrm{ tan}h\left({{\varvec{c}}}_{t}\right),$$
(8)
$${\mathbf{o}}_{t}=\sigma \left({\mathbf{x}}_{t}\cdot {\mathbf{W}}_{xo}+{{\mathbf{h}}_{t-1}\cdot {\mathbf{W}}_{ho}+\mathbf{b}}_{o}\right),$$
(9)
$${\mathbf{c}}_{t}={\mathbf{f}}_{t}\odot{\mathbf{c}}_{t-1}+{{\varvec{i}}}_{t}\odot{\mathbf{g}}_{t},$$
(10)
$${\mathbf{f}}_{t}=\sigma \left({\mathbf{x}}_{t}\cdot {\mathbf{W}}_{xf}+{{\mathbf{h}}_{t-1}\cdot {\mathbf{W}}_{hf}+\mathbf{b}}_{f}\right),$$
(11)
$${\mathbf{i}}_{t}=\sigma \left({\mathbf{x}}_{t}\cdot {\mathbf{W}}_{xi}+{{\mathbf{h}}_{t-1}\cdot {\mathbf{W}}_{hi}+\mathbf{b}}_{i}\right),$$
(12)
$${\mathbf{g}}_{t}=\mathrm{tan}h\left({\mathbf{x}}_{t}\cdot {\mathbf{W}}_{xg}+{{\mathbf{h}}_{t-1}\cdot {\mathbf{W}}_{hg}+\mathbf{b}}_{g}\right),$$
(13)

where \({\mathbf{h}}_{t}\) and \({\mathbf{c}}_{t}\) are hidden and cell states, respectively. \({\mathbf{o}}_{t},{\mathbf{f}}_{t}\), \({\mathbf{i}}_{t}\), and \({\mathbf{g}}_{t}\), respectively represent the out, forget, input, and aggregated gates. As above expression, the final output is compared to the desired output or target label for measuring the error/loss value. In this work, the mean squared error (MSE) is used as:

$$E\left({\widehat{y}}_{t},{y}_{t}\right)=\frac{1}{N}\sum_{i=1}^{N}{\left({\widehat{y}}_{i}-{y}_{i}\right)}^{2},$$
(14)

where \(N\) is the total number of the outputs. In order to derive the predicted value close to the actual or desired value, the error function needs to be minimized as much as possible. The gradient descent (GD) method is used to minimize the error function. For simplicity, suppose that all weights and biases of those above models are defined as \({\varvec{\uptheta}}=\left\{\mathbf{W},\mathbf{b}\right\}\), the new weights and biases \(\left({{\varvec{\uptheta}}}^{*}\right)\) are adjusted or corrected by the following delta rule, i.e.,

$${{\varvec{\uptheta}}}^{\mathbf{*}}={\varvec{\uptheta}}-\upeta \frac{\partial E}{\partial {\varvec{\uptheta}}},$$
(15)

where \(\eta\) is the learning rate. The \(\frac{\partial E}{\partial {\varvec{\uptheta}}}\) is the partial derivative of the \(E\) with respect to \({\varvec{\uptheta}}\). For the ANN, the gradients are computed on a single pair input and output. On the other hand, the gradients in the LSTM model must be calculated through times depending on the network learning timesteps/loopbacks. Finally, the derived gradients are propagated backward through the network for updating or correcting the weights and biases. This process is repeated over the given epoch number or until the minimum error goal is reached.

Model performance analysis

The ESF forecasting model works on classification problem following the ESF labels as 0 and 1, thereby the model performance is evaluated using the confusion matrix. The model performance can be biased when the imbalanced data are presented to the model. Hence, besides the accuracy, another confusion matrix factor is considered including the recall (sensitivity), precise (positive predictive value), and F1 score (Fawcett 2006; Sokolova and Lapalme 2009). The above performance metrics are defined as:

$$\mathrm{Accuracy}=\frac{\mathrm{TP}+\mathrm{TN}}{\mathrm{TP}+\mathrm{TN}+\mathrm{FP}+\mathrm{FN}},$$
(16)
$$\mathrm{Recall}=\frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FN}},$$
(17)
$$\mathrm{Precise}=\frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FP}},$$
(18)
$$F1\,\mathrm{score}=\frac{2}{\left(\frac{1}{\mathrm{Recall}}+\frac{1}{\mathrm{Precise}}\right)},$$
(19)

where \(\mathrm{TP}\), \(\mathrm{TN}\), \(\mathrm{FP}\), and \(\mathrm{FN}\) are true positive (true one), true negative (true zero), false positive (false one), and false negative (false zero), respectively. These metrics represent the counted number between the model’s predicted and actual observed values.

Furthermore, the root mean squared error (RMSE) is used to evaluate the difference between the model predictions and the actual observations of the ESF probability, i.e.,

$$\mathrm{RMSE}=\sqrt{\frac{1}{N}\sum_{i=1}^{N}{\left({\mathrm{ESF}\_\mathrm{mod}}_{i}-{\mathrm{ESF}\_\mathrm{obs}}_{i}\right)}^{2},}$$
(20)

where \({\mathrm{ESF}\_\mathrm{mod}}_{i}\) and \({\mathrm{ESF}\_\mathrm{obs}}_{i}\) represent the predicted and observed ESF values and \(N\) is the total number of the sequence data.

Data preprocessing

The ESF data are manually obtained at every 15 min using the ionogram scaler software. The ESF labels are represented by 0 and 1 which indicate the absence and presence of the ESF events, respectively. In this study, we consider the occurrence period of ESF events at least 1 h in the data selection. This means that one detected ESF event consists of two consecutive 30-min intervals with observed ESF. That is, we have two counted ESF events within an entire hour. For the input parameters, the Vd parameter is retrieved by differentiating the hʹF with respect to the time, Abadi et al. (2022). The AGW parameter is derived by the wavelet transformation (Morlet Wavelet) analyzing on the foF2 signal in the range of 30–90 min of the wavelet’s periodicities (Manju et al. 2016; Torrence and Compo 1998). Note that the missing foF2 values are replaced using the linear interpolation with nonmonotonically increasing sample points. The averaged power spectrum of the AGW is used in this study. Figure 1 shows the calculation procedures to obtain the AGW coefficients. The foF2 signal \({{\varvec{x}}}_{{\varvec{n}}}\) is first taken into discrete Fourier transform (DFT) producing \({\widehat{{\varvec{x}}}}_{k}\). The obtained power spectrum of the wavelet transform at each time (17:00 to 22:30 LT) and wavelet scale is stored in \({W}_{n}\left(s\right)\). The averaged power spectrum of the wavelet transform is obtained by summing up entire 30 to 90-min periodicities at each time of the wavelet power spectrum \({W}_{n}\left(s\right)\).

Fig. 1
figure 1

Flowchart of the calculation procedure of the averaged power spectrum from the wavelet transformation

The diurnal and seasonal parameters are represented by hour and day numbers passed through sine and cosine functions for obtaining the cyclical time and seasonal variations (McKinnell and Poole 2000; Watthanasangmechai et al. 2012). Before training the model, all the input parameters are scaled using the standardization method as shown in Eq. (21). The predicted output of the models is obtained in floating number according to the activation function of the neuron at the output layer, then it is classified whether class 0 or class 1 using 0.5 as the threshold value:

$${x}^{\mathrm{new}}=\frac{x-{\mu }_{x}}{{\sigma }_{x}},$$
(21)

where \({\mu }_{x}\) and \({\sigma }_{x}\) are the mean and standard deviation values.

Experimental design and input combinations

In this study, the ANN and LSTM models are developed using the 30-min interval data and the output labels with 0 and 1. The best network structure and input parameters are determined through varying neurons and input features. The neuron numbers are varied from 10 to 50. The LSTM loopback learning is started from 30 to 90 min for finding the optimal one. These loopbacks are given depending on possible period relationships between the influencing input parameters and the ESF generation. The F-layer height and its drift velocity play a significant role on the ESF post-sunset events (Abadi et al. 2020, 2022; Anderson and Redmon 2017; Aswathy and Manju 2018). Also, the seeding perturbations are revealed to exhibit significant evidences before the ESF generations in both post-sunset and post-midnight (Manju et al. 2016; Otsuka 2018). The models are trained and tested with the data in 2008 to 2018 and 2019, respectively.

Selection of the input parameters is considered through direct and indirect influencing parameters which are investigated in previous studies. Correlative measurements between input parameters against the ESF are mainly relied on reported information in previous studies. The input combinations are designed to investigate the significant input feature and case study of the new local parameters for improving the ESF model. The entire input features are included as hour number (\(\mathrm{Ts\, and\, Tc}\)), day number (\(\mathrm{Ds\, and\, Dc}\)), F10.7, SSN, ap3 and Ap, kp3 and Kp, hʹF, F-layer drift velocity (Vd), and atmospheric gravity waves (AGW). The input-based parameter is first defined as the input A for finding the best network structure and loopbacks. Later, the best network with input-based parameters is onward utilized to find the best input combination as the following.

The optimal network structure is derived by considering the confusion matrix factors. The prediction step of the ANN and LSTM models is made at 0.5 h or 30-min ahead.

The proposed LSTM model for ESF forecasting

The LSTM structure for ESF forecasting model is shown in Fig. 2. The standard LSTM model is used in this study (Hochreiter and Schmidhuber 1997; Alex Graves 2012). The LSTM model learning depends on multi-timesteps/loopbacks over the time series data and produces a single output at the next time step. The LSTM hidden layer contains the identical neuron over each loopback. Lastly, the output of the LSTM model is converted into 0 and 1 using threshold with 0.5. The LSTM model hyperparameters are determined as 150 training epochs, and 0.001 of the learning step. The error function is represented by the mean squared error (MSE). Initial weights are randomized under the normal distribution. The bias initialization is given as zeros. The weight and bias corrections are adjusted using the gradient descent (GD) method.

Fig. 2
figure 2

The proposed LSTM network for the ESF forecasting model

Data preparation and selection

This study utilizes the ionogram data from the Frequency Modulated Continuous Wave (FMCW) ionosonde at the CPN station. The dataset covers the 24th cycle of the solar activity from 2008 to 2019. The data in equinoctial months only are only utilized including February, March, April, August, September, and October. The ESF data are manually collected every 15 min. We resample the data every 30 min in this study. Scant data are available for some years, such as 2010, 2012, and 2017 due to the missing data, which are excluded in this study. The period of the data is considered from 17:00 LT to 06:30 LT. As mentioned above, the space input parameters are designed including diurnal variations, seasonal variations, F10.7 solar flux, sun spot number (SSN), magnetic 3-hourly averaged ap index (ap3) and magnetic daily averaged Ap index (Ap), magnetic 3-hourly averaged kp index (kp3) and magnetic daily averaged Kp index (Kp), local ionospheric F-layer height (hʹF), local vertical drift velocity (Vd), and averaged power spectrum of the atmospheric gravity waves (AGW). The AGW is derived by analyzing the wavelet transform of the foF2 signal within 30–90 min of the wavelet’s periodicities, Manju et al. (2016).

The available ESF data at the CPN station, Thailand, cover 2008 to 2019 as depicted in Fig. 3. The available number of days in each month for the training set from 2008 to 2018 is summarized as shown in Fig. 4. More data are from March and April than other months. Table 1 shows the data quantity in ESF absence and presence for training and testing sets.

Fig. 3
figure 3

The statistics of the available ESF data at Chumphon station from 2008 to 2019

Fig. 4
figure 4

The available number of days of the training set from 2008 to 2018 (2017 is excluded)

Table 1 The training and testing data samples between ESF class 0 and 1

Results and discussions

Selection of the optimal network structure and input parameters for the ESF forecasting model

The optimal network structure and input parameters are determined using the 30-min interval data. We first investigate the optimal input parameter for both ANN and LSTM models. The time and seasonal factors are always used in the models. The solar and magnetic indices such as F10.7, SSN, ap3, Ap, kp3, and Kp are orderly considered for investigating the optimal one. These parameters are put together as combinations in data set with diurnal and seasonal parameters as seen in Table 2. The structure of the ANN model includes 1 to 4 hidden layers, while the LSTM model includes only one hidden layer. As shown in Tables 3 and 4, the confusion matrix factors of the models are obtained and evaluated on each given input parameter. As we can see from these tables, both SSN and ap3 indices clearly improve the models. Therefore, the following input-based parameters are selected including \(\mathrm{Ts}\), \(\mathrm{Tc}\), \(\mathrm{Ds}\), \(\mathrm{Dc}\), SSN, and ap3. They are extensively used to determine other optimal parameters such as the neuron numbers of the ANN and LSTM models, and learning loopback of the LSTM model.

Table 2 Designs of the input combinations for the ANN and LSTM models
Table 3 The performances of the ANN and LSTM models for each solar index
Table 4 The performances of the ANN and LSTM models for each magnetic index

Figure 5 shows the performance of the ANN model with different neuron numbers on four factors. We obtain the optimal number of neurons and hidden layers for the ANN model through considering various network structures. It is noticed that the ANN network with three and four hidden layers tends to meet with overfitting and underfitting while training. Thus, the ANN network with two hidden layers is selected because the model training and validating have robustness over underfitting and overfitting problems. Note that the result of the ANN network with two hidden layers is only shown here in Fig. 5. As a result, the total accuracy is slightly different at given neuron numbers. However, it can be distinguished at 30 neurons which yield high values in recall and F1 score. Therefore, the 30 neurons are selected for the ANN model in this work.

Fig. 5
figure 5

The performance of ANN model on different neuron numbers

Similarly, we also find the optimal cell/neuron number for the LSTM model by increasing the cell number from 10 to 50 with the step of 5 and the learning loopbacks are given with a fixed hour. As shown in Fig. 6, the total accuracy is above 77% and slightly different in given cell numbers. The LSTM model with 35 cells yields high performance as indicated in recall and F1 score. The 35 cells are then selected and used onward to determine the optimal learning loopback for the LSTM model. The result of determining the learning loopbacks is shown on Fig. 7. The result denotes that enhancement of the learning loopbacks causes declination of the LSTM model performance, thereby, this implies that the sufficient LSTM learnability depends on the prior information which is very close to the present time of the prediction. An hour of the learning loopbacks is majorly chosen for the LSTM model in this work.

Fig. 6
figure 6

The performance of LSTM model on different cell numbers

Fig. 7
figure 7

The performance of LSTM model on different learning loopbacks

In summary, from Figs. 5, 6 and 7, we choose two hidden layers containing 30 neurons for the ANN model, and one hidden layer containing 35 cells and an hour of the loopback for the LSTM model.

In this section, we investigate the combination of the new local input parameters in Table 2 labeled as A to E, respectively. Figure 8 shows the ANN model performances on each input combination. Importantly, for the ANN model, the input D which contains the local AGW index produces an 83% accuracy over other input combinations. Benefits of using the input D with the AGW index can improve the precision, total accuracy, and F1 score of the ANN model. In contrast, the ANN model without the AGW index can only gain the high recall by the input C which contains the Vd index. Therefore, the reduction of the false prediction of the ESF absence can be improved using the local Vd index, while the false prediction of the ESF presence is reduced with the AGW index.

Fig. 8
figure 8

The performances of the ANN model with each input combination

Similarly, Fig. 9 shows results of the LSTM model performance tested on each input combination. The 85% accuracy is clearly achieved with the input E over other input combinations. The high precision and accuracy are gained when both local Vd and AGW indices are simultaneously used. The LSTM model trained without the AGW index produces high values in recall and F-score, when hʹF is used. On the other hand, the LSTM model yields high accuracy and precision, when AGW is used. Consequently, the use of the AGW index is revealed with the improved performance on the ESF presence prediction, namely the reduction of the false ESF presence prediction. Hence, the input E contains both local Vd and AGW parameters, which significantly improves the LSTM model. This improvement is expected due to the non-directional relation of the AGW against the ESF events. Usually, the propagated AGW amplitudes and high drift velocity are early observed before the post-sunset ESF onset and the developed ESF events (Manju et al. 2016; Tsunoda 2010; Tulasi et al. 2017; Abadi et al. 2020). The post-midnight ESF generation is also reported to be indicated by the AGW in solstice months, Otsuka (2018). On the other hand, we expect that the restrictions of the single time independent learning and the complicated feature of the ESF characteristics can negatively cause the ANN model while the improvement of the LSTM model can be clearly seen. This might also be one advantage of the LSTM model in recognizing and characterizing the complicated data features using the loopbacks. Importantly, the LSTM model can gain higher accuracy using the AGW index than the ANN model.

Fig. 9
figure 9

The performance of LSTM model on each input combination

As shown in Fig. 9, we can notice improvements of the model using input B and C more than input A through recall, precise, and F1 score factors. However, this still indicates that input A itself can give high accuracy value, with the drawback of other decreased parameters. Furthermore, when we consider the local input parameters as Vd and AGW, the result exhibits that the input E can also significantly improve the proposed ESF model.

Next, the optimal models are retrained and retested for evaluating their predictive performance. As shown in Fig. 10, the comparative results between the ANN and LSTM models are shown through four confusion factors. Totally, the 85% and the 83% accuracies can be accomplished by the LSTM and ANN models. The LSTM model is more robust with the false positive prediction or false ESF presence prediction as exhibited in the precise score. On the other hand, the ANN model can attain high value in recall, namely its robustness against the false negative or false ESF absence prediction.

Fig. 10
figure 10

Comparison of the best model between ANN and LSTM models

Therefore, the LSTM model with 35 neurons, one hour loopbacks, and input E is proposed in this work. The LSTM model with input E can achieve higher score of the accuracy and the precise over the ANN model with input D as shown in Fig. 10. This indicates that the LSTM model can gain more improvements from the use of the local AGW parameter than the ANN model. In addition, this work can exhibit the proof of utilizing the investigated important knowledge of the ESF events to design fundamental input features and new local parameters for improving the predictability of the ESF occurrence. It is realized that the input E with Vd and AGW indices can improve the LSTM model. Therefore, it is suggested to use the LSTM model trained with the input E for achieving the improvements of the ESF model as shown in Figs. 8, 9 and 10.

Although the recall, precise, and F1 score are below 0.5, the overall accuracy of the spread-F presence and absence, is in the levels of 85% or higher. We understand that when these values are low, it means the false prediction needs to be improved. These metric values are low possibly due to various reasons such as imbalanced data and complex input features. However, this work can exhibit the significant role of the new local Vd and AGW parameters can improve the model performance.

Prediction of the monthly probability percentage of the ESF events

Figure 11 shows the monthly probability percentage of the observed ESF events compared with the predictions of the ANN, LSTM, and IRI-2016 models. This is to exhibit the model predictability on the unseen data (2019). The vertical axis represents the probability percentage of the ESF events. The horizontal axis is the local time from 17:00 to 06:30. In 2019, this year is on descending side of the minimum solar activity in the 24th solar cycle.

Fig. 11
figure 11

Comparison of the observed values against the predicted values of the ANN, LSTM, and IRI-2016 models

Compared with the observed ESF, the ANN model tends to overestimate the ESF probability percentage in March, April, and October, but underestimate the ESF probability percentage in February, August, and September as shown in Fig. 11. The overestimations of the ANN model are seen between 20:00 LT and 03:00 LT in March, April, September, and October, while the underestimated values are observed from 18:00 LT to 19:30LT in those months. The underestimation of the ANN model is mainly observed during 18:00 LT to 06:00 LT in February and August more than in September and October. The false prediction percentages of the ANN model are between 10 and 40% in terms of RMSE as shown in Fig. 12, respectively. Both the LSTM and the ANN models also overestimate (underestimate) the ESF probability percentage in those months in Fig. 11. The LSTM model overestimates the ESF probability percentage in April and October. Underestimations of the LSTM model are clearly seen in February, August, and September. The errors of the LSTM are between 10 and 21%, as shown in Fig. 12. For the IRI-2016 model, it is clearly seen that it overestimates the ESF probability percentage in all months as shown in Fig. 11. The high overestimation of the IRI-2016 model is observed during 18:30 LT to 06:30 LT in February, March, April, September, and October, except in August. The RMSE of the IRI-2016 model are between 19 and 37% in these months. The LSTM model is more appropriate than the ANN and IRI-2016 models for forecasting the ESF probability percentage at CPN station.

Fig. 12
figure 12

The RMSE of the ANN, LSTM, and IRI-2016 models’ prediction in each month

In addition, this study reports that the overestimations of the IRI-2016 model are observed in February, March, April, September, and October in 2019 at CPN station. This is consistent with other previous studies cover from 2004 to 2014 such as Klinngam et al. (2015) in CPN, Chiangmai (CMU) and Kototabang (KTB) stations, Afolayan et al. (2019) in CPN, Kwajalein (KWJ) and Jicamarca (JIC) stations, Thammavongsy et al. (2020) in CPN station, and Thammavongsy et al. (2022) in CPN and Tirunelveli (TIR) stations. Therefore, one of the IRI-2016 model’s errors is expected due to the uniquely localized ESF characteristics applied in B-spline method.

From Fig. 11, the occurrence rates of the observed ESF events in the March equinoxes are higher than in September equinoxes. The high occurrence rates are observed during post-sunset in March equinoxes and in contrast, during post-midnight in September equinoxes. Therefore, this indicated that the high occurrence rate of the post-midnight irregularities can also be observed in equinoctial months as well as solstice months during the low solar activity, Otsuka (2018). The highest occurrence rate is literally around 60% in March equinoxes and 40% in September equinoxes.

Furthermore, we show the RMSE of the LSTM model trained with and without the AGW parameter in Table 5. The AGW role can only improve the LSTM models in post-midnight for March, April, August, September, and October; thereby, this agrees with the investigated positive AGW relations in post-midnight by Otsuka (2018). In particular, we found that the LSTM model can earn the improvements in September for all cases. Thus, this implies the significant role of the AGW in September. In contrast, the AGW index does not provide significant information in improving the LSTM model during the post-sunset.

Table 5 RMSE of the LSTM model trained with and without the use of the AGW parameter

Prediction of the daily probability percentage of the ESF events

Figure 13 shows the residual errors between the observations and predictions. The daily ESF percentage are computed by summing up the ESF presences from 17:00 to 06:30 LT and dividing by the total number of the ESF presence and absence. The vertical axis represents the residual errors between the observations and the ESF models. The x-axis represents the day number in March and September equinoxes with 110 available days as February (1–20), March (21–46), April (47–64), August (65–81), September (82–99), and October (100–110). As a result, the residual error graphs are slightly different in each day between ANN and LSTM models. The ANN and LSTM models give the errors above 20% on days from March to August (35–77). In October (100–110), the ANN errors are seen to be higher than the LSTM. As a total RMSE result, we observe that the LSTM model achieves 21.38% of the RMSE and 23.19% is of the ANN model. Furthermore, the outperformance of the LSTM model is possibly derived from the new local input features and advantages of the LSTM neuron design. However, the daily prediction of the ESF events is still a hard work due to the complex characteristics of the ESF events against input features and imbalanced data. This result can imply and exhibit toward the important role of local input features and the advanced LSTM model.

Fig. 13
figure 13

Comparison of the observed ESF daily percentage against the ANN and LSTM models

On the other hand, we can analyze performance of the ANN and LSTM models for the daily ESF prediction. In this case, the daily ESF percentage is greater than zero, which is defined as the ESF day (ESF-1) and otherwise, it is defined as non ESF day (ESF-0). Thus, the predictive performance of the models can be summarized into the confusion matrix as shown in Fig. 14. The total accuracy of the ANN model is about 57% (64) and 61% (67) is of the LSTM model. The correct prediction rate of the ESF day is obtained about 53% (47) and 68% (47) the ANN and LSTM models, respectively. The correct prediction of non ESF day is 60% (63) and 56% (63) in ANN and LSTM, respectively. Therefore, we notice that the precision of the ANN and LSTM models can achieve above 50% for the daily ESF prediction.

Fig. 14
figure 14

Confusion matrix of the (a) ANN and (b) LSTM models on the daily ESF prediction. The top and right panels, respectively, represent the model and observation based on the classification of two desired classes. Inside panels with two left diagonal green boxes represent the numbers of the correct prediction against observed values for both ESF-0 and ESF-1 classes. The two right diagonal green boxes represent the false prediction for both classes. The bottom green box is the total number of correct predictions of both classes. The two bottom orange boxes represent the total number of false and correct predictions of the model, and the right gold boxes are also the total number for the observation

Prediction of the short ESF events within 30-min ahead

The proposed ESF forecasting model is mainly designed to work on one step prediction ahead for both ANN and LSTM models. In Fig. 15, the ANN and LSTM models could provide 83.3% (2566) and 85.4% (2672) accuracies. Predictability of the ANN and LSTM models on the ESF-0 is higher than on the ESF-1. This is caused by the data ratio on two ESF classes are not equivalent as shown on Table 2. Anyway, the data imbalancing techniques are inappropriately used on this ESF time series data because cyclic components of the diurnal and seasonal indices can be affected. As a result, the correct prediction of the ESF-0 is gained about 90.4% (2682) and 89.5% (2825) for the ANN and LSTM models, respectively. For the ESF-1 prediction, the ANN and LSTM models achieve 35.5% (397) and 39.7% (252). The LSTM model still outperforms the ANN for this ESF short-term prediction. Besides, we still notice the difficulties of the model predictability on the ESF-1 prediction. This might be caused by several possible factors and dimensions such as the relationships between the ESF and input characteristics are unclear for short-term variability, Li et al. (2021), the restriction of the available data can be a cause for losing the significant information, and the data portion can negatively affect the model recognizability producing the biased result. However, this study can exhibit the potential of LSTM model for the ESF forecasting. It is clearly revealed that development of the ESF forecasting model is still a challenging work.

Fig. 15
figure 15

Confusion matrix of the (a) ANN and (b) LSTM models on the 30-min prediction ahead. Description of each inside panel is the same as Fig. 14

Based on the previous study of Abadi et al. (2022) can achieve ~ 80% accuracy for predicting the post-sunset ESF occurrence over stations in Southeast Asia. Also, ~ 80% accuracy of predicting the post-sunset ESF events is exhibited over stations, Anderson and Redmon (2017). In this study, the ANN and LSTM models can achieve 83.3% (2566) and 85.4% (2627) for the post-sunset and post-might ESF predictions. This can imply that the local information is more important and necessary for developing the ESF forecasting model. In addition, this would be suggested to use the model learning with loopback capability for the ESF forecasting model and the coefficient parameters should be designed separately particularly for each season.

Conclusions

In this work, we develop the ESF forecasting models using ANN and LSTM models. The new local F-layer drift velocity and power spectrum of the atmospheric gravity waves are successfully presented to improve the ESF forecasting model. Use of the AGW index is first found to improve the LSTM model during the post-midnight rather than the post-sunset. The proposed LSTM model is able to give a favorable performance for developing the ESF forecasting model. The LSTM model achieves 85.4% accuracy and 83.3% is of the ANN network. Development of the daily ESF prediction is first studied in this work; it can complete about 55% accuracy for both ANN and LSTM models. The proposed LSTM model works effectively in reducing the overestimation compared to the ANN model. For the monthly probability predictions, the proposed LSTM model yields the RMSE below 20%. The IRI-2016 model overestimates the ESF probability more than 20% (RMSE) for all months. Also, the IRI-2016 model provides higher RMSE than the proposed LSTM model. Furthermore, the three-dimensional aspects of the performance analyses show that the day-to-day prediction of the ESF events is still in difficult task. The low F1 score of around 0.3 suggests that the model improvement in the future for more accurate prediction of the LSTM model. One of the possible solutions is the new input features which can exhibit characteristics of the ESF presence based on physical mechanisms. The restriction of the available data is one issue in this study. Therefore, we expect that the near future development of the ESF forecasting model should go onto the attentive model learnings and new local input parameters in enhancing the input intelligence and model learnability.

Availability of data and materials

The ionogram data are obtained from NICT. The daily solar activity (F10.7 and SSN) are supported by the Space Physics Data Facility (SPDF) OMNIWeb database at https://omniweb.gsfc.nasa.gov/form/dx1.html. The 3-hourly and daily averaged magnetic activity indices (ap3 and Ap; kp3 and Kp) are downloaded from World Data Center for Geomagnetism, Kyoto University at https://wdc.kugi.kyoto-u.ac.jp/index.html. The historical spread-F and total electron content (TEC) database are also downloadable from Thai GNSS and Space Weather Information Center website http://iono-gnss.kmitl.ac.th/.

Abbreviations

AI:

Artificial intelligence

AGW:

Atmospheric gravity wave

ANN:

Artificial neural network

Ap:

Geomagnetic activity index

CPN:

Chumphon

Dn:

Day number

ESF:

Equatorial spread-F

EPB:

Equatorial plasma bubble

foF2:

Critical frequency of F2-layer

FMCW:

Frequency Modulated Continuous Wave

FN:

False negative

F10.7:

Solar flux emission with 10.7 cm radio wavelength

GD:

Gradient descent

HF:

High frequency

hʹF:

Virtual height of F layer

Hn:

Hour number

RNN:

Recurrent Neural Network

SSN:

Sunspot number

SPDF:

Space Physics Data Facility

Kp:

Disturbance indictor of the Earth’s magnetic field

LSTM:

Long short- term memory

IRI:

International Reference Ionosphere

MSE:

Mean squared error

RMSE:

Root mean squared error

TP:

True positive

FP:

False positive

TN:

True negative

LSWS:

Large-scale wave structure

PRE:

Pre-reversal enhancement

Vd:

Vertical drift velocity

References

Download references

Acknowledgements

This work is supported by King Mongkut’s Institute of Technology Ladkrabang under the Grant KDS2019/016 and funded from the NSRF via the Program Management Unit for the Human Resources and Institutional Development, Research and Innovation (Grant no. B05F640197). The ASEAN IVO (http://www.nict.go.jp/en/asean_ivo/index.html) project, Precise positioning and Artificial Intelligence (AI) for Ionospheric Disturbances in Low-Latitude Region in ASEAN, was involved in the production of the contents of this publication and financially supported by NICT (http://www.nict.go.jp/en/index.html).

Funding

This work received funding from King Mongkut’s Institute of Technology Ladkrabang under the Grant KDS2019/016, the NSRF via the Program Management Unit for the Human Resources and Institutional Development, Research and Innovation (Grant no. B05F640197) and the ASEAN IVO (http://www.nict.go.jp/en/asean_ivo/index.html) project.

Author information

Authors and Affiliations

Authors

Contributions

PT implemented experiments, analyzed and interpreted results, and wrote the first draft of the paper. PS supported consultation, methodological implementations, comments, and corrections. LMMM contributed comments, corrections, and modifications. KH provided data sharing and corrections. DL participated in reviews and corrections. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Pornchai Supnithi.

Ethics declarations

Competing interests

The authors declare that no conflict of competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Thammavongsy, P., Supnithi, P., Myint, L.M.M. et al. Equatorial spread-F forecasting model with local factors using the long short-term memory network. Earth Planets Space 75, 118 (2023). https://doi.org/10.1186/s40623-023-01868-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s40623-023-01868-7

Keywords