 Full paper
 Open access
 Published:
Equatorial spreadF forecasting model with local factors using the long shortterm memory network
Earth, Planets and Space volume 75, Article number: 118 (2023)
Abstract
The predictability of the nighttime equatorial spreadF (ESF) occurrences is essential to the ionospheric disturbance warning system. In this work, we propose ESF forecasting models using two deep learning techniques: artificial neural network (ANN) and long shortterm memory (LSTM). The ANN and LSTM models are trained with the ionogram data from equinoctial months in 2008 to 2018 at Chumphon station (CPN), Thailand near the magnetic equator, where the ESF onset typically occurs, and they are tested with the ionogram data from 2019. These models are trained especially with new local input parameters such as vertical drift velocity of the Flayer height (Vd) and atmospheric gravity waves (AGW) collected at CPN station together with global parameters of solar and geomagnetic activity. We analyze the ESF forecasting models in terms of monthly probability, daily probability and occurrence, and diurnal predictions. The proposed LSTM model can achieve the 85.4% accuracy when the local parameters: Vd and AGW are utilized. The LSTM model outperforms the ANN, particularly in February, March, April, and October. The results show that the AGW parameter plays a significant role in improvements of the LSTM model during postmidnight. When compared to the IRI2016 model, the proposed LSTM model can provide lower discrepancies from observational data.
Graphical Abstract
Introduction
The equatorial spreadF (ESF) is a nighttime ionospheric irregularity near the magnetic equatorial region. ESF is observed on ionogram images from the Frequency Modulated Continuous Wave (FMCW) ionosonde, Abdu et al. (1981). The appearance of ESF is represented by the spreading of the ionogram trace along height and frequency axes on the ionogram image, indicating irregularities in the Flayer bottomside. Generation of the ESF is observed after postsunset due to plasma instabilities, which is explained through the Rayleigh–Taylor instability, Woodman and La Hoz (1976). The ESF generation depends on precursor conditions such as the evening prereversal enhancement in the vertical plasma drift (PRE), the Flayer bottomside density gradient, seeding perturbations, and wave structures in the plasma density and initiated polarization electric field, Abdu (2019). The ESF characteristics are basically understood and described through numerous parameters. Therefore, this fundamental knowledge can contribute to an effective development of the ESF forecasting model.
The generation and development of ESF phenomena are triggered by the largescale wave structures (LSWS) in Flayer heights and together with the PRE vertical drift during the afternoon until postsunset hours, Abdu et al. (2015). In some cases, the ESF occurrence rate can approach 100% if the vertical plasma drift velocity is higher than 40 m/s, Abadi et al. (2020). The study of Tulasi et al. (2017) also reports that increased drift velocities of the postsunset (postmidnight) at around 45–256 m/s (26–128 m/s) can cause the ionospheric plasma irregularity. Additionally, atmospheric gravitational waves (AGWs) play a significant role on the development of the seed plasma perturbations from AGWdriven neutral wind perturbations. Also, the study of Tsunoda (2010) emphasizes that the seeding perturbations are crucial in the development of ESF occurrences. The amplitude of the the seed perturbations with Flayer height variations plays a significant role in the ESF occurrence or nonoccurrence, Manju et al. (2016). The latitudinal expansion of ESF/equatorial plasma bubble (EPB) occurrences is found due to the Flayer height bottomside changes (Saito and Maruyama 2006; Rungraengwajiake et al. 2013). Also, the ESF characteristics over longitudinally close stations are not necessarily the same due to their local conditions, Thammavongsy et al. (2022). The high ESF occurrence rate is observed in the high solar activity and near the magnetic equatorial region. The high and low probabilities of the ESF occurrences are noticed in equinoctial and solstice months, respectively, Klinngam et al. (2015). In contrast, the suppression and time delay (3–9 h) of the ESF commencement can be caused by high magnetic activities, Li et al. (2009). Several evidences are investigated under boundary of all possible local and global conditions. However, the local conditions are uniquely crucial and necessary to extend the understandability and predictability of the ionospheric irregularity.
The climatological characteristics of the ESF occurrence are well in terms of controlling factors and physical mechanisms for longitudinal variations, seasonal variations, and solar activity. However, the daytoday and shortterm variabilities in the ESF occurrence are still difficult to be accurately predicted with the longterm controlling factors, Li et al. (2021). The efforts of developing the ESF forecasting model have been attempted in space weather studies. The development of the forecasting model on longterm variability of the ESF occurrence is designed over large longitudinal areas, for example, the monthly probability of the ESF occurrence can be successfully modeled using the cubic Bspline method, Abdu et al. (2003), the ESF forecasting models are also developed using the neural networks over Brazil and Thailand (McKinnell et al. 2010; Thammavongsy et al. 2020), thresholding determined by the hʹF and S4 scintillation can be used to forecast the ESF events in Peruvian and Indian sectors (Anderson and Redmon 2017; Aswathy and Manju 2018), and the postsunset ESF prediction model is accomplished using the logistic regression in Southeast Asia, Abadi et al. (2022). These studies exhibit the development of methods for ESF forecasting models and they discussed the important role of the space weather parameters such as diurnal, seasonal, solar indices, and magnetic indices. In contrast, the utilization of local parameters with machine learning is not considered. Then this might be an important key to improve the ESF forecasting model.
Recently, the artificial intelligence (AI) is widely applied in space weather forecasting models. In particular, deep learning networks are used to solve complex problems. One of the most powerful deep neural networks for the time series data is a long shortterm memory (LSTM) network, (Hochreiter and Schmidhuber 1997; Liu et al. 2020; Tan et al. 2018). In space weather studies, the LSTM model is successfully applied in the global and midlatitude TEC forecasting, foF2 and hmF2 forecasting models for both quiet and disturbed geomagnetic storms, geomagnetic Kp index forecasting, and SYMH and ASYH forecasting (Liu et al. 2020; Ulukavak 2020; Kim et al. 2020; Tan et al. 2018; ColladoVillaverde et al. 2021). Therefore, the multitimesteps/loopbacks and advanced functionalities of the LSTM model are highly expected in improving the ESF forecasting model. The relationship between global and local conditions, and the ESF generation and development are well investigated in the literatures. To achieve better accuracy of longterm and shortterm ESF prediction, the investigation of the new characteristic inputs is still needed for developing the ESF forecasting model based on prior knowledge.
In this work, we develop ESF forecasting models using Deep Learning techniques: artificial neural network (ANN) and long shortterm memory (LSTM) for Chumphon (CPN) station, Thailand. The new local input parameters including the virtual height of Flayer (hʹF), Flayer drift velocity of the hʹF (Vd), and the atmospheric gravity waves (AGW) are considered. The efficiency between the ANN and LSTM models is compared in this study. In addition, the IRI2016 model is validated with the observations. From the results, the predictive outputs are evaluated in threedimensional analyses consisting of monthly probability, daily probability and occurrence, and diurnal predictions.
Data and methods
Description of input parameters
The input parameters of the ESF forecasting model in this study include the daily solar activity (F10.7 and SSN) downloaded from the Space Physics Data Facility (SPDF) OMNIWeb database at https://omniweb.gsfc.nasa.gov/form/dx1.html, the 3hourly and daily averaged magnetic activity indices (ap3 and Ap; kp3 and Kp) from World Data Center for Geomagnetism, Kyoto University at https://wdc.kugi.kyotou.ac.jp/index.html, and the local hʹF parameter, Flayer drift velocity (Vd), and atmospheric gravity waves (AGW). The last three input parameters are gained by manually scaling the ionogram, differentiating the hʹF against the time, and analyzing wavelet transform, respectively. In addition, the diurnal and seasonal variations are represented by hour number (Hn) and day number (Dn), which are converted using the sine and cosine functions for the continuity in hour and day numbers as the following:
where 24 is the total number of hours and 356.25 is used due to the included leap year in the data set, Watthanasangmechai et al. (2012).
ANN and LSTM algorithms
The ANN has been successfully deployed on time series data, Zhang (2012). One of the powerful ANNs is the LSTM network that can fulfill the short and long recognitive terms on the time series data. The LSTM model is mainly designed to mitigate the vanishing gradient problem existing in the Recurrent Neural Network (RNN) and extend the ability of the model memorization (Hochreiter and Schmidhuber 1997; Alex Graves 2012). Then this leads to increments of the LSTM model learnability for both short and long terms on the time series data. The most significant components in the LSTM model structure are proposed including the cell state, input gate, forget gate, aggregated gate, and output gate as expressed in Eqs. (9)–(13). As shown in this study, the standard LSTM model with many inputs and single output is mainly used. The ANN and LSTM models are mathematically shortly expressed as the following:
The final output \({\widehat{y}}_{t}\) of the ANN at time tth is obtained by
where \(\sigma\) can be any activation functions such as hyperbolic tangent, rectified linear unit (ReLU), softmax, etc., \(l\) represents the layer number and \({\mathbf{W}}_{\mathrm{hy}}\) and \({\mathbf{b}}_{y}\) are weight and bias vectors representing the connections between hidden and output layers. The current output signal depends on the output of the previous hidden layer as the following:
where \({\mathbf{W}}_{xh}\) is the weight connections between the input and the hidden layers, and \(\mathbf{x}\in {R}^{1\times d}\) are the input vector to the network and \(d\) indicates number of the input features.
For the LSTM model, the final output \({\widehat{y}}_{t}\) is computed depending on the hidden state as the following:
where \({\mathbf{h}}_{t}\) and \({\mathbf{c}}_{t}\) are hidden and cell states, respectively. \({\mathbf{o}}_{t},{\mathbf{f}}_{t}\), \({\mathbf{i}}_{t}\), and \({\mathbf{g}}_{t}\), respectively represent the out, forget, input, and aggregated gates. As above expression, the final output is compared to the desired output or target label for measuring the error/loss value. In this work, the mean squared error (MSE) is used as:
where \(N\) is the total number of the outputs. In order to derive the predicted value close to the actual or desired value, the error function needs to be minimized as much as possible. The gradient descent (GD) method is used to minimize the error function. For simplicity, suppose that all weights and biases of those above models are defined as \({\varvec{\uptheta}}=\left\{\mathbf{W},\mathbf{b}\right\}\), the new weights and biases \(\left({{\varvec{\uptheta}}}^{*}\right)\) are adjusted or corrected by the following delta rule, i.e.,
where \(\eta\) is the learning rate. The \(\frac{\partial E}{\partial {\varvec{\uptheta}}}\) is the partial derivative of the \(E\) with respect to \({\varvec{\uptheta}}\). For the ANN, the gradients are computed on a single pair input and output. On the other hand, the gradients in the LSTM model must be calculated through times depending on the network learning timesteps/loopbacks. Finally, the derived gradients are propagated backward through the network for updating or correcting the weights and biases. This process is repeated over the given epoch number or until the minimum error goal is reached.
Model performance analysis
The ESF forecasting model works on classification problem following the ESF labels as 0 and 1, thereby the model performance is evaluated using the confusion matrix. The model performance can be biased when the imbalanced data are presented to the model. Hence, besides the accuracy, another confusion matrix factor is considered including the recall (sensitivity), precise (positive predictive value), and F1 score (Fawcett 2006; Sokolova and Lapalme 2009). The above performance metrics are defined as:
where \(\mathrm{TP}\), \(\mathrm{TN}\), \(\mathrm{FP}\), and \(\mathrm{FN}\) are true positive (true one), true negative (true zero), false positive (false one), and false negative (false zero), respectively. These metrics represent the counted number between the model’s predicted and actual observed values.
Furthermore, the root mean squared error (RMSE) is used to evaluate the difference between the model predictions and the actual observations of the ESF probability, i.e.,
where \({\mathrm{ESF}\_\mathrm{mod}}_{i}\) and \({\mathrm{ESF}\_\mathrm{obs}}_{i}\) represent the predicted and observed ESF values and \(N\) is the total number of the sequence data.
Data preprocessing
The ESF data are manually obtained at every 15 min using the ionogram scaler software. The ESF labels are represented by 0 and 1 which indicate the absence and presence of the ESF events, respectively. In this study, we consider the occurrence period of ESF events at least 1 h in the data selection. This means that one detected ESF event consists of two consecutive 30min intervals with observed ESF. That is, we have two counted ESF events within an entire hour. For the input parameters, the Vd parameter is retrieved by differentiating the hʹF with respect to the time, Abadi et al. (2022). The AGW parameter is derived by the wavelet transformation (Morlet Wavelet) analyzing on the foF2 signal in the range of 30–90 min of the wavelet’s periodicities (Manju et al. 2016; Torrence and Compo 1998). Note that the missing foF2 values are replaced using the linear interpolation with nonmonotonically increasing sample points. The averaged power spectrum of the AGW is used in this study. Figure 1 shows the calculation procedures to obtain the AGW coefficients. The foF2 signal \({{\varvec{x}}}_{{\varvec{n}}}\) is first taken into discrete Fourier transform (DFT) producing \({\widehat{{\varvec{x}}}}_{k}\). The obtained power spectrum of the wavelet transform at each time (17:00 to 22:30 LT) and wavelet scale is stored in \({W}_{n}\left(s\right)\). The averaged power spectrum of the wavelet transform is obtained by summing up entire 30 to 90min periodicities at each time of the wavelet power spectrum \({W}_{n}\left(s\right)\).
The diurnal and seasonal parameters are represented by hour and day numbers passed through sine and cosine functions for obtaining the cyclical time and seasonal variations (McKinnell and Poole 2000; Watthanasangmechai et al. 2012). Before training the model, all the input parameters are scaled using the standardization method as shown in Eq. (21). The predicted output of the models is obtained in floating number according to the activation function of the neuron at the output layer, then it is classified whether class 0 or class 1 using 0.5 as the threshold value:
where \({\mu }_{x}\) and \({\sigma }_{x}\) are the mean and standard deviation values.
Experimental design and input combinations
In this study, the ANN and LSTM models are developed using the 30min interval data and the output labels with 0 and 1. The best network structure and input parameters are determined through varying neurons and input features. The neuron numbers are varied from 10 to 50. The LSTM loopback learning is started from 30 to 90 min for finding the optimal one. These loopbacks are given depending on possible period relationships between the influencing input parameters and the ESF generation. The Flayer height and its drift velocity play a significant role on the ESF postsunset events (Abadi et al. 2020, 2022; Anderson and Redmon 2017; Aswathy and Manju 2018). Also, the seeding perturbations are revealed to exhibit significant evidences before the ESF generations in both postsunset and postmidnight (Manju et al. 2016; Otsuka 2018). The models are trained and tested with the data in 2008 to 2018 and 2019, respectively.
Selection of the input parameters is considered through direct and indirect influencing parameters which are investigated in previous studies. Correlative measurements between input parameters against the ESF are mainly relied on reported information in previous studies. The input combinations are designed to investigate the significant input feature and case study of the new local parameters for improving the ESF model. The entire input features are included as hour number (\(\mathrm{Ts\, and\, Tc}\)), day number (\(\mathrm{Ds\, and\, Dc}\)), F10.7, SSN, ap3 and Ap, kp3 and Kp, hʹF, Flayer drift velocity (Vd), and atmospheric gravity waves (AGW). The inputbased parameter is first defined as the input A for finding the best network structure and loopbacks. Later, the best network with inputbased parameters is onward utilized to find the best input combination as the following.
The optimal network structure is derived by considering the confusion matrix factors. The prediction step of the ANN and LSTM models is made at 0.5 h or 30min ahead.
The proposed LSTM model for ESF forecasting
The LSTM structure for ESF forecasting model is shown in Fig. 2. The standard LSTM model is used in this study (Hochreiter and Schmidhuber 1997; Alex Graves 2012). The LSTM model learning depends on multitimesteps/loopbacks over the time series data and produces a single output at the next time step. The LSTM hidden layer contains the identical neuron over each loopback. Lastly, the output of the LSTM model is converted into 0 and 1 using threshold with 0.5. The LSTM model hyperparameters are determined as 150 training epochs, and 0.001 of the learning step. The error function is represented by the mean squared error (MSE). Initial weights are randomized under the normal distribution. The bias initialization is given as zeros. The weight and bias corrections are adjusted using the gradient descent (GD) method.
Data preparation and selection
This study utilizes the ionogram data from the Frequency Modulated Continuous Wave (FMCW) ionosonde at the CPN station. The dataset covers the 24th cycle of the solar activity from 2008 to 2019. The data in equinoctial months only are only utilized including February, March, April, August, September, and October. The ESF data are manually collected every 15 min. We resample the data every 30 min in this study. Scant data are available for some years, such as 2010, 2012, and 2017 due to the missing data, which are excluded in this study. The period of the data is considered from 17:00 LT to 06:30 LT. As mentioned above, the space input parameters are designed including diurnal variations, seasonal variations, F10.7 solar flux, sun spot number (SSN), magnetic 3hourly averaged ap index (ap3) and magnetic daily averaged Ap index (Ap), magnetic 3hourly averaged kp index (kp3) and magnetic daily averaged Kp index (Kp), local ionospheric Flayer height (hʹF), local vertical drift velocity (Vd), and averaged power spectrum of the atmospheric gravity waves (AGW). The AGW is derived by analyzing the wavelet transform of the foF2 signal within 30–90 min of the wavelet’s periodicities, Manju et al. (2016).
The available ESF data at the CPN station, Thailand, cover 2008 to 2019 as depicted in Fig. 3. The available number of days in each month for the training set from 2008 to 2018 is summarized as shown in Fig. 4. More data are from March and April than other months. Table 1 shows the data quantity in ESF absence and presence for training and testing sets.
Results and discussions
Selection of the optimal network structure and input parameters for the ESF forecasting model
The optimal network structure and input parameters are determined using the 30min interval data. We first investigate the optimal input parameter for both ANN and LSTM models. The time and seasonal factors are always used in the models. The solar and magnetic indices such as F10.7, SSN, ap3, Ap, kp3, and Kp are orderly considered for investigating the optimal one. These parameters are put together as combinations in data set with diurnal and seasonal parameters as seen in Table 2. The structure of the ANN model includes 1 to 4 hidden layers, while the LSTM model includes only one hidden layer. As shown in Tables 3 and 4, the confusion matrix factors of the models are obtained and evaluated on each given input parameter. As we can see from these tables, both SSN and ap3 indices clearly improve the models. Therefore, the following inputbased parameters are selected including \(\mathrm{Ts}\), \(\mathrm{Tc}\), \(\mathrm{Ds}\), \(\mathrm{Dc}\), SSN, and ap3. They are extensively used to determine other optimal parameters such as the neuron numbers of the ANN and LSTM models, and learning loopback of the LSTM model.
Figure 5 shows the performance of the ANN model with different neuron numbers on four factors. We obtain the optimal number of neurons and hidden layers for the ANN model through considering various network structures. It is noticed that the ANN network with three and four hidden layers tends to meet with overfitting and underfitting while training. Thus, the ANN network with two hidden layers is selected because the model training and validating have robustness over underfitting and overfitting problems. Note that the result of the ANN network with two hidden layers is only shown here in Fig. 5. As a result, the total accuracy is slightly different at given neuron numbers. However, it can be distinguished at 30 neurons which yield high values in recall and F1 score. Therefore, the 30 neurons are selected for the ANN model in this work.
Similarly, we also find the optimal cell/neuron number for the LSTM model by increasing the cell number from 10 to 50 with the step of 5 and the learning loopbacks are given with a fixed hour. As shown in Fig. 6, the total accuracy is above 77% and slightly different in given cell numbers. The LSTM model with 35 cells yields high performance as indicated in recall and F1 score. The 35 cells are then selected and used onward to determine the optimal learning loopback for the LSTM model. The result of determining the learning loopbacks is shown on Fig. 7. The result denotes that enhancement of the learning loopbacks causes declination of the LSTM model performance, thereby, this implies that the sufficient LSTM learnability depends on the prior information which is very close to the present time of the prediction. An hour of the learning loopbacks is majorly chosen for the LSTM model in this work.
In summary, from Figs. 5, 6 and 7, we choose two hidden layers containing 30 neurons for the ANN model, and one hidden layer containing 35 cells and an hour of the loopback for the LSTM model.
In this section, we investigate the combination of the new local input parameters in Table 2 labeled as A to E, respectively. Figure 8 shows the ANN model performances on each input combination. Importantly, for the ANN model, the input D which contains the local AGW index produces an 83% accuracy over other input combinations. Benefits of using the input D with the AGW index can improve the precision, total accuracy, and F1 score of the ANN model. In contrast, the ANN model without the AGW index can only gain the high recall by the input C which contains the Vd index. Therefore, the reduction of the false prediction of the ESF absence can be improved using the local Vd index, while the false prediction of the ESF presence is reduced with the AGW index.
Similarly, Fig. 9 shows results of the LSTM model performance tested on each input combination. The 85% accuracy is clearly achieved with the input E over other input combinations. The high precision and accuracy are gained when both local Vd and AGW indices are simultaneously used. The LSTM model trained without the AGW index produces high values in recall and Fscore, when hʹF is used. On the other hand, the LSTM model yields high accuracy and precision, when AGW is used. Consequently, the use of the AGW index is revealed with the improved performance on the ESF presence prediction, namely the reduction of the false ESF presence prediction. Hence, the input E contains both local Vd and AGW parameters, which significantly improves the LSTM model. This improvement is expected due to the nondirectional relation of the AGW against the ESF events. Usually, the propagated AGW amplitudes and high drift velocity are early observed before the postsunset ESF onset and the developed ESF events (Manju et al. 2016; Tsunoda 2010; Tulasi et al. 2017; Abadi et al. 2020). The postmidnight ESF generation is also reported to be indicated by the AGW in solstice months, Otsuka (2018). On the other hand, we expect that the restrictions of the single time independent learning and the complicated feature of the ESF characteristics can negatively cause the ANN model while the improvement of the LSTM model can be clearly seen. This might also be one advantage of the LSTM model in recognizing and characterizing the complicated data features using the loopbacks. Importantly, the LSTM model can gain higher accuracy using the AGW index than the ANN model.
As shown in Fig. 9, we can notice improvements of the model using input B and C more than input A through recall, precise, and F1 score factors. However, this still indicates that input A itself can give high accuracy value, with the drawback of other decreased parameters. Furthermore, when we consider the local input parameters as Vd and AGW, the result exhibits that the input E can also significantly improve the proposed ESF model.
Next, the optimal models are retrained and retested for evaluating their predictive performance. As shown in Fig. 10, the comparative results between the ANN and LSTM models are shown through four confusion factors. Totally, the 85% and the 83% accuracies can be accomplished by the LSTM and ANN models. The LSTM model is more robust with the false positive prediction or false ESF presence prediction as exhibited in the precise score. On the other hand, the ANN model can attain high value in recall, namely its robustness against the false negative or false ESF absence prediction.
Therefore, the LSTM model with 35 neurons, one hour loopbacks, and input E is proposed in this work. The LSTM model with input E can achieve higher score of the accuracy and the precise over the ANN model with input D as shown in Fig. 10. This indicates that the LSTM model can gain more improvements from the use of the local AGW parameter than the ANN model. In addition, this work can exhibit the proof of utilizing the investigated important knowledge of the ESF events to design fundamental input features and new local parameters for improving the predictability of the ESF occurrence. It is realized that the input E with Vd and AGW indices can improve the LSTM model. Therefore, it is suggested to use the LSTM model trained with the input E for achieving the improvements of the ESF model as shown in Figs. 8, 9 and 10.
Although the recall, precise, and F1 score are below 0.5, the overall accuracy of the spreadF presence and absence, is in the levels of 85% or higher. We understand that when these values are low, it means the false prediction needs to be improved. These metric values are low possibly due to various reasons such as imbalanced data and complex input features. However, this work can exhibit the significant role of the new local Vd and AGW parameters can improve the model performance.
Prediction of the monthly probability percentage of the ESF events
Figure 11 shows the monthly probability percentage of the observed ESF events compared with the predictions of the ANN, LSTM, and IRI2016 models. This is to exhibit the model predictability on the unseen data (2019). The vertical axis represents the probability percentage of the ESF events. The horizontal axis is the local time from 17:00 to 06:30. In 2019, this year is on descending side of the minimum solar activity in the 24th solar cycle.
Compared with the observed ESF, the ANN model tends to overestimate the ESF probability percentage in March, April, and October, but underestimate the ESF probability percentage in February, August, and September as shown in Fig. 11. The overestimations of the ANN model are seen between 20:00 LT and 03:00 LT in March, April, September, and October, while the underestimated values are observed from 18:00 LT to 19:30LT in those months. The underestimation of the ANN model is mainly observed during 18:00 LT to 06:00 LT in February and August more than in September and October. The false prediction percentages of the ANN model are between 10 and 40% in terms of RMSE as shown in Fig. 12, respectively. Both the LSTM and the ANN models also overestimate (underestimate) the ESF probability percentage in those months in Fig. 11. The LSTM model overestimates the ESF probability percentage in April and October. Underestimations of the LSTM model are clearly seen in February, August, and September. The errors of the LSTM are between 10 and 21%, as shown in Fig. 12. For the IRI2016 model, it is clearly seen that it overestimates the ESF probability percentage in all months as shown in Fig. 11. The high overestimation of the IRI2016 model is observed during 18:30 LT to 06:30 LT in February, March, April, September, and October, except in August. The RMSE of the IRI2016 model are between 19 and 37% in these months. The LSTM model is more appropriate than the ANN and IRI2016 models for forecasting the ESF probability percentage at CPN station.
In addition, this study reports that the overestimations of the IRI2016 model are observed in February, March, April, September, and October in 2019 at CPN station. This is consistent with other previous studies cover from 2004 to 2014 such as Klinngam et al. (2015) in CPN, Chiangmai (CMU) and Kototabang (KTB) stations, Afolayan et al. (2019) in CPN, Kwajalein (KWJ) and Jicamarca (JIC) stations, Thammavongsy et al. (2020) in CPN station, and Thammavongsy et al. (2022) in CPN and Tirunelveli (TIR) stations. Therefore, one of the IRI2016 model’s errors is expected due to the uniquely localized ESF characteristics applied in Bspline method.
From Fig. 11, the occurrence rates of the observed ESF events in the March equinoxes are higher than in September equinoxes. The high occurrence rates are observed during postsunset in March equinoxes and in contrast, during postmidnight in September equinoxes. Therefore, this indicated that the high occurrence rate of the postmidnight irregularities can also be observed in equinoctial months as well as solstice months during the low solar activity, Otsuka (2018). The highest occurrence rate is literally around 60% in March equinoxes and 40% in September equinoxes.
Furthermore, we show the RMSE of the LSTM model trained with and without the AGW parameter in Table 5. The AGW role can only improve the LSTM models in postmidnight for March, April, August, September, and October; thereby, this agrees with the investigated positive AGW relations in postmidnight by Otsuka (2018). In particular, we found that the LSTM model can earn the improvements in September for all cases. Thus, this implies the significant role of the AGW in September. In contrast, the AGW index does not provide significant information in improving the LSTM model during the postsunset.
Prediction of the daily probability percentage of the ESF events
Figure 13 shows the residual errors between the observations and predictions. The daily ESF percentage are computed by summing up the ESF presences from 17:00 to 06:30 LT and dividing by the total number of the ESF presence and absence. The vertical axis represents the residual errors between the observations and the ESF models. The xaxis represents the day number in March and September equinoxes with 110 available days as February (1–20), March (21–46), April (47–64), August (65–81), September (82–99), and October (100–110). As a result, the residual error graphs are slightly different in each day between ANN and LSTM models. The ANN and LSTM models give the errors above 20% on days from March to August (35–77). In October (100–110), the ANN errors are seen to be higher than the LSTM. As a total RMSE result, we observe that the LSTM model achieves 21.38% of the RMSE and 23.19% is of the ANN model. Furthermore, the outperformance of the LSTM model is possibly derived from the new local input features and advantages of the LSTM neuron design. However, the daily prediction of the ESF events is still a hard work due to the complex characteristics of the ESF events against input features and imbalanced data. This result can imply and exhibit toward the important role of local input features and the advanced LSTM model.
On the other hand, we can analyze performance of the ANN and LSTM models for the daily ESF prediction. In this case, the daily ESF percentage is greater than zero, which is defined as the ESF day (ESF1) and otherwise, it is defined as non ESF day (ESF0). Thus, the predictive performance of the models can be summarized into the confusion matrix as shown in Fig. 14. The total accuracy of the ANN model is about 57% (64) and 61% (67) is of the LSTM model. The correct prediction rate of the ESF day is obtained about 53% (47) and 68% (47) the ANN and LSTM models, respectively. The correct prediction of non ESF day is 60% (63) and 56% (63) in ANN and LSTM, respectively. Therefore, we notice that the precision of the ANN and LSTM models can achieve above 50% for the daily ESF prediction.
Prediction of the short ESF events within 30min ahead
The proposed ESF forecasting model is mainly designed to work on one step prediction ahead for both ANN and LSTM models. In Fig. 15, the ANN and LSTM models could provide 83.3% (2566) and 85.4% (2672) accuracies. Predictability of the ANN and LSTM models on the ESF0 is higher than on the ESF1. This is caused by the data ratio on two ESF classes are not equivalent as shown on Table 2. Anyway, the data imbalancing techniques are inappropriately used on this ESF time series data because cyclic components of the diurnal and seasonal indices can be affected. As a result, the correct prediction of the ESF0 is gained about 90.4% (2682) and 89.5% (2825) for the ANN and LSTM models, respectively. For the ESF1 prediction, the ANN and LSTM models achieve 35.5% (397) and 39.7% (252). The LSTM model still outperforms the ANN for this ESF shortterm prediction. Besides, we still notice the difficulties of the model predictability on the ESF1 prediction. This might be caused by several possible factors and dimensions such as the relationships between the ESF and input characteristics are unclear for shortterm variability, Li et al. (2021), the restriction of the available data can be a cause for losing the significant information, and the data portion can negatively affect the model recognizability producing the biased result. However, this study can exhibit the potential of LSTM model for the ESF forecasting. It is clearly revealed that development of the ESF forecasting model is still a challenging work.
Based on the previous study of Abadi et al. (2022) can achieve ~ 80% accuracy for predicting the postsunset ESF occurrence over stations in Southeast Asia. Also, ~ 80% accuracy of predicting the postsunset ESF events is exhibited over stations, Anderson and Redmon (2017). In this study, the ANN and LSTM models can achieve 83.3% (2566) and 85.4% (2627) for the postsunset and postmight ESF predictions. This can imply that the local information is more important and necessary for developing the ESF forecasting model. In addition, this would be suggested to use the model learning with loopback capability for the ESF forecasting model and the coefficient parameters should be designed separately particularly for each season.
Conclusions
In this work, we develop the ESF forecasting models using ANN and LSTM models. The new local Flayer drift velocity and power spectrum of the atmospheric gravity waves are successfully presented to improve the ESF forecasting model. Use of the AGW index is first found to improve the LSTM model during the postmidnight rather than the postsunset. The proposed LSTM model is able to give a favorable performance for developing the ESF forecasting model. The LSTM model achieves 85.4% accuracy and 83.3% is of the ANN network. Development of the daily ESF prediction is first studied in this work; it can complete about 55% accuracy for both ANN and LSTM models. The proposed LSTM model works effectively in reducing the overestimation compared to the ANN model. For the monthly probability predictions, the proposed LSTM model yields the RMSE below 20%. The IRI2016 model overestimates the ESF probability more than 20% (RMSE) for all months. Also, the IRI2016 model provides higher RMSE than the proposed LSTM model. Furthermore, the threedimensional aspects of the performance analyses show that the daytoday prediction of the ESF events is still in difficult task. The low F1 score of around 0.3 suggests that the model improvement in the future for more accurate prediction of the LSTM model. One of the possible solutions is the new input features which can exhibit characteristics of the ESF presence based on physical mechanisms. The restriction of the available data is one issue in this study. Therefore, we expect that the near future development of the ESF forecasting model should go onto the attentive model learnings and new local input parameters in enhancing the input intelligence and model learnability.
Availability of data and materials
The ionogram data are obtained from NICT. The daily solar activity (F10.7 and SSN) are supported by the Space Physics Data Facility (SPDF) OMNIWeb database at https://omniweb.gsfc.nasa.gov/form/dx1.html. The 3hourly and daily averaged magnetic activity indices (ap3 and Ap; kp3 and Kp) are downloaded from World Data Center for Geomagnetism, Kyoto University at https://wdc.kugi.kyotou.ac.jp/index.html. The historical spreadF and total electron content (TEC) database are also downloadable from Thai GNSS and Space Weather Information Center website http://ionognss.kmitl.ac.th/.
Abbreviations
 AI:

Artificial intelligence
 AGW:

Atmospheric gravity wave
 ANN:

Artificial neural network
 Ap:

Geomagnetic activity index
 CPN:

Chumphon
 Dn:

Day number
 ESF:

Equatorial spreadF
 EPB:

Equatorial plasma bubble
 foF2:

Critical frequency of F2layer
 FMCW:

Frequency Modulated Continuous Wave
 FN:

False negative
 F10.7:

Solar flux emission with 10.7 cm radio wavelength
 GD:

Gradient descent
 HF:

High frequency
 hʹF:

Virtual height of F layer
 Hn:

Hour number
 RNN:

Recurrent Neural Network
 SSN:

Sunspot number
 SPDF:

Space Physics Data Facility
 Kp:

Disturbance indictor of the Earth’s magnetic field
 LSTM:

Long short term memory
 IRI:

International Reference Ionosphere
 MSE:

Mean squared error
 RMSE:

Root mean squared error
 TP:

True positive
 FP:

False positive
 TN:

True negative
 LSWS:

Largescale wave structure
 PRE:

Prereversal enhancement
 Vd:

Vertical drift velocity
References
Abadi P, Otsuka Y, Supriadi S, Olla A (2020) Probability of ionospheric plasma bubble occurrence as a function of prereversal enhancement deduced from ionosondes in Southeast Asia. AIP Conf Proc 2226:050001. https://doi.org/10.1063/5.0002321.
Abadi P, Ahmad UA, Otsuka Y, Jamjareegulgarn P, Martiningrum DR, Faturahman A, Perwitasari S, Saputra RE, Septiawan RR (2022) Modeling Postsunset equatorial spreadF occurrence as a function of evening F layer plasma drift using logistic regression, deduced from ionosondes in Southeast Asia. Remote Sens 14(8):1896. https://doi.org/10.3390/rs14081896
Abdu MA (2019) Daytoday and shortterm variabilities in the equatorial plasma bubble/spread F irregularity seeding and development. Prog Earth Planet Sci 6:11. https://doi.org/10.1186/s4064501902581
Abdu MA, Batista IS, Bittencourt JA (1981) Some characteristics of spread F at the magnetic equatorial station Fortaleza. J Geophys Res 86(A8):6836–6842. https://doi.org/10.1029/ja086ia08p06836
Abdu MA, Souza JR, Batista IS, Sobral JHA (2003) Equatorial spread F statistics and empirical representation for IRI: a regional model for the Brazilian Longitude Sector. Adv Space Res 31(3):703–716. https://doi.org/10.1016/S02731177(03)000310
Abdu MA, de Souza JR, Kherani EA, Batista IS, MacDougall JW, Sobral JHA (2015) Wave structure and polarization electric field development in the bottomside F layer leading to postsunset equatorial spread F. J Geophys Res Space Phys 120:6930–6940. https://doi.org/10.1002/2015JA021235
Afolayan AO, Mandeep JS, Abdullah M, Buhari SM (2019) Statistics of spread F characteristics across different sectors and IRI 2016 prediction. Adv Space Res 64(10):2154–2163. https://doi.org/10.1016/j.asr.2019.06.019
Anderson DN, Redmon RJ (2017) Forecasting scintillation activity and equatorial spread F. Space Weather 15:495–502. https://doi.org/10.1002/2016SW001554
Aswathy RP, Manju G (2018) Hindcasting of equatorial spread F using seasonal empirical models. J Geophys Res Space Phys 123:1515–1524. https://doi.org/10.1002/2017JA025036
ColladoVillaverde A, Muñoz P, Cid C (2021) Deep neural networks with convolutional and LSTM layers for SYMH and ASYH forecasting. Space Weather. https://doi.org/10.1029/2021SW002748
Fawcett T (2006) An introduction to ROC analysis. Pattern Recogn Lett 27(8):861–874. https://doi.org/10.1016/j.patrec.2005.10.010
Graves A (2012) Supervised sequence labelling with Recurrent Neural Networks. Springer, Heidelberg. https://doi.org/10.1007/9783642247972
Hochreiter S, Schmidhuber J (1997) Long shortterm memory. Neural Comput 9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Kim JH, Kwak YS, Kim YH, Moon SI, Jeong SH, Yun JY (2020) Regional ionospheric parameter estimation by assimilating the LSTM trained results into the SAMI2 Model. Space Weather. https://doi.org/10.1029/2020SW002590
Klinngam S, Supnithi P, Rungraengwajiake S, Tsugawa T, Ishii M, Maruyama T (2015) The occurrence of equatorial spreadF at conjugate stations in Southeast Asia. Adv Space Res 55(8):2139–2147. https://doi.org/10.1016/j.asr.2014.10.003
Li G, Ning B, Liu L, Wan W, Liu JY (2009) Effect of magnetic activity on plasma bubbles over equatorial and lowlatitude regions in East Asia. Ann Geophys 27:303–312. https://doi.org/10.5194/angeo273032009
Li G, Ning B, Otsuka Y, Abdu MA, Abadi P, Liu Z, Spogli L, Wan W (2021) Challenges to equatorial plasma bubble and ionospheric scintillation shortterm forecasting and future aspects in East and Southeast Asia. Surv Geophys 42:201–238. https://doi.org/10.1007/s10712020096135
Licata RJ, Mehta PM, Tobiska WK, Huzurbazar S (2022) Machinelearned HASDM thermospheric mass density model with uncertainty quantification. Space Weather 20:4. https://doi.org/10.1029/2021SW002915
Liu L, Zou S, Yao Y, Wang Z (2020) Forecasting global ionospheric TEC using deep learning approach space. Weather 18:e2020SW002501. https://doi.org/10.1029/2020SW002501
Manju G, Madhav Haridas MK, Aswathy RP (2016) Role of gravity wave seed perturbations in ESF daytoday variability: a quantitative approach. Adv Space Res 57(4):1021–1028. https://doi.org/10.1016/j.asr.2015.12.019
McKinnell LA, Poole AWV (2000) The development of a neural network based short term foF2 forecast program. Phys Chem Earth Part C 25(4):287–290. https://doi.org/10.1016/S14641917(00)000180
McKinnell LA, Paradza M, Cilliers P, Abdu MA, de Souza J (2010) Predicting the probability of occurrence of spreadF over Brazil using neural networks. Adv Space Res 46(8):1047–1054. https://doi.org/10.1016/J.ASR.2010.06.020
Otsuka Y (2018) Review of the generation mechanism of postmidnight irregularities in the equatorial and lowlatitude ionosphere. Prog Earth Planet Sci 5:57. https://doi.org/10.1186/s4064501802127
Rungraengwajiake S, Supnithi P, Tsugawa T, Maruyama T, Nagatsuma T (2013) The variation of equatorial spreadF occurrences observed by ionosondes at Thailand Longitude Sector. Adv Space Res 52(10):1809–1819. https://doi.org/10.1016/j.asr.2013.07.041
Saito S, Maruyama T (2006) Ionospheric height variations observed by ionosondes along magnetic meridian and plasma bubble onsets. Ann Geophys 24:2991–2996. https://doi.org/10.5194/angeo2429912006
Sokolova M, Lapalme G (2009) A systematic analysis of performance measures for classification tasks. Inf Process Manage 45(4):427–437. https://doi.org/10.1016/j.ipm.2009.03.002
Tan Y, Hu Q, Wang Z, Zhong Q (2018) Geomagnetic index Kp forecasting with LSTM. Space Weather 16:406–416. https://doi.org/10.1002/2017SW001764
Thammavongsy P, Supnithi P, Phakphisut W, Hozumi K, Tsugawa T (2020) SpreadF prediction model for the equatorial Chumphon station, Thailand. Adv Space Res 65(1):152–162. https://doi.org/10.1016/j.asr.2019.09.040
Thammavongsy P, Supnithi P, Myint LMM, Sripathi S, Hozumi K, Lakanchanh D (2022) Comparison of observed equatorial spreadF statistics between two longitudinally separated magnetic equatorial stations and the IRI2016 model during low and high solar activities. Adv Space Res 69(6):2501–2511. https://doi.org/10.1016/j.asr.2021.12.050
Torrence C, Compo G (1998) A practical guide to wavelet analysis. Bull Am Meteol Soc 79:61–78. https://doi.org/10.1175/15200477(1998)079%3c0061:APGTWA%3e2.0.CO;2
Tsunoda RT (2010) On equatorial spread F: establishing a seeding hypothesis. J Geophys Res. https://doi.org/10.1029/2010JA015564
Tulasi SR, Ajith KK, Yokoyama T, Yamamoto M, Niranjan K (2017) Vertical rise velocity of equatorial plasma bubbles estimated from Equatorial Atmosphere Radar (EAR) observations and HIRB model simulations. J Geophys Res Space Phys 122:6584–6594. https://doi.org/10.1002/2017JA024260
Ulukavak M (2020) Deep learning for ionospheric TEC forecasting at midlatitude stations in Turkey. Acta Geophys 69:589–606. https://doi.org/10.1007/s11600021005688
Watthanasangmechai K, Supnithi P, Lerkvaranyu S, Tsugawa T, Nagatsuma T, Maruyama T (2012) TEC prediction with neural network for equatorial latitude station in Thailand. Earth Planet Space 64:473–483. https://doi.org/10.5047/eps.2011.05.025
Woodman RF, La Hoz C (1976) Radar observations of F region equatorial irregularities. J Geophys Res 81:5447–5466. https://doi.org/10.1029/JA081i031p05447
Zhang GP (2012) Neural networks for timeseries forecasting. In: Handbook of natural computing, Springer, Berlin, Heidelberg, pp 461–477. https://doi.org/10.1007/9783540929109_14
Acknowledgements
This work is supported by King Mongkut’s Institute of Technology Ladkrabang under the Grant KDS2019/016 and funded from the NSRF via the Program Management Unit for the Human Resources and Institutional Development, Research and Innovation (Grant no. B05F640197). The ASEAN IVO (http://www.nict.go.jp/en/asean_ivo/index.html) project, Precise positioning and Artificial Intelligence (AI) for Ionospheric Disturbances in LowLatitude Region in ASEAN, was involved in the production of the contents of this publication and financially supported by NICT (http://www.nict.go.jp/en/index.html).
Funding
This work received funding from King Mongkut’s Institute of Technology Ladkrabang under the Grant KDS2019/016, the NSRF via the Program Management Unit for the Human Resources and Institutional Development, Research and Innovation (Grant no. B05F640197) and the ASEAN IVO (http://www.nict.go.jp/en/asean_ivo/index.html) project.
Author information
Authors and Affiliations
Contributions
PT implemented experiments, analyzed and interpreted results, and wrote the first draft of the paper. PS supported consultation, methodological implementations, comments, and corrections. LMMM contributed comments, corrections, and modifications. KH provided data sharing and corrections. DL participated in reviews and corrections. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that no conflict of competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Thammavongsy, P., Supnithi, P., Myint, L.M.M. et al. Equatorial spreadF forecasting model with local factors using the long shortterm memory network. Earth Planets Space 75, 118 (2023). https://doi.org/10.1186/s40623023018687
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s40623023018687