Classification of the equatorial plasma bubbles using convolutional neural network and support vector machine techniques

Thanakulketsarat, Thananphat; Supnithi, Pornchai; Myint, Lin Min Min; Hozumi, Kornyanat; Nishioka, Michi

doi:10.1186/s40623-023-01903-7

Full paper
Open access
Published: 16 October 2023

Classification of the equatorial plasma bubbles using convolutional neural network and support vector machine techniques

Thananphat Thanakulketsarat¹,
Pornchai Supnithi ORCID: orcid.org/0000-0002-7793-7868¹,
Lin Min Min Myint¹,
Kornyanat Hozumi² &
…
Michi Nishioka²

Earth, Planets and Space volume 75, Article number: 161 (2023) Cite this article

1319 Accesses
1 Citations
1 Altmetric
Metrics details

Abstract

Equatorial plasma bubble (EPB) is a phenomenon characterized by depletions in ionospheric plasma density being formed during post-sunset hours. The ionospheric irregularities can lead to disruptions in trans-ionospheric radio systems, navigation systems and satellite communications. Real-time detection and classification of EPBs are crucial for the space weather community. Since 2020, the Prachomklao radar station, a very high frequency (VHF) radar station, has been installed at Chumphon station (Geographic: 10.72° N, 99.73° E and Geomagnetic: 1.33° N) and started to produce radar images ever since. In this work, we propose two real-time plasma bubble detection systems based on support vector machine techniques. Two designs are made with the convolutional neural network (CNN) and singular value decomposition (SVD) used for feature extraction, the connected to the support vector machine (SVM) for EPB classification. The proposed models are trained using quick look (QL) plot images from the VHF radar system at the Chumphon station, Thailand, in 2017. The experimental results show that the combined CNN-SVM model, using the RBF kernel, achieves the highest accuracy of 93.08% while the model using the polynomial kernel achieved an accuracy of 92.14%. On the other hand, the combined SVD-SVM models yield the accuracies of 88.37% and 85.00% for RBF and polynomial kernels of SVM, respectively.

Graphical Abstract

Introduction

Equatorial plasma bubbles (EPBs) refer to the plasma depletion region in the ionosphere. They typically originate at the bottom-side of the F-layer after sunset, particularly, near the magnetic equator, forming bubble-shaped structures, Abadi et al. (2014). The initiation mechanism of the post-sunset EPBs are known to be caused by the Rayleigh–Taylor instability, Kelly (2009). Global-scale geomagnetic storms can also further increase the severities and occurrences of EPBs, Deepak et al. (2023). The scale size of the EPBs ranges from a few kilometers to several hundred kilometers. The EPBs typically extend in higher latitudes via the magnetic flux tube (Huba 2008), which also reach higher altitudes along the magnetic field lines. The plasma bubbles can cause signal fading and scintillation, which can lead to degradation in communication, positioning and navigation, Alison et al. (2018).

To observe EPBs, since the 1960s, researchers have been using remote-sensing based on radio waves to investigate ionospheric irregularities such as equatorial plasma bubbles (EPBs). Ionosondes, all-sky airglow imagers, Global Navigation Satellite System (GNSS) receivers, in situ satellites, very high frequency (VHF) radar stations, equatorial atmosphere radar (EAR), and incoherent scatter radar (ISR) are some of the techniques that have been widely utilized. For example, the ionosonde system measures the electron density profile of the ionosphere by sending a radio signal upward and receiving the echo signals in return, Wei et al. (2021). Global Navigation Satellite Systems (GNSS) receivers are also utilized to detect the ionospheric irregularity through the analysis of pseudo range information, Chendong et al. (2022). The parameters of the plasma in the ionosphere can be directly measured by in situ satellites, Wernik et al. (2007). These properties include the temperature, electron density and drift velocity.

Very high frequency (VHF) radar is another powerful tool for observing plasma bubbles in the ionosphere. Compared to other observation methods, VHF radar offers several advantages. It provides high spatial resolution (on the order of meters) and high temporal resolution (on the order of seconds), allowing for detailed observations of plasma bubble evolution. VHF radar can also cover a large portion of the ionosphere, making it possible to study the large-scale structure and dynamics of plasma bubbles. Additionally, VHF radar can monitor plasma bubbles in real-time, making it useful for space weather forecasting and studying the effects of plasma bubbles on communication and navigation systems. VHF radar has been applied in various fields, including sea surface current detection and wind direction estimation, Cochin et al. (2005). Importantly, VHF radar useful for space weather forecasting and, particularly, the structures of plasma bubbles. In the ionosphere, VHF radar has been used to detect boundary irregularities in the F-region at night in Kototabang, Indonesia, Otsuka et al. (2009), and to observe plasma bubbles at the bottom-side of the F-layer, Tsunoda et al. (1982).

VHF radars operate in the VHF band between 30 and 300 MHz and they can observe ionospheric irregularities such as EPBs by transmitting a radar signal upward and receiving the signal after it has been scattered by the irregularities, Nakata et al. (2005). The equatorial atmosphere radar, also known as the EAR, is a specialized type of VHF radar that was developed for the purpose of researching the equatorial ionosphere and has been put to extensive use in researching EPBs, Pavan et al. (2017). The incoherent scatter radar at Jicamarca Radio Observatory among others is another powerful tool used to study the ionosphere, including EPBs, by transmitting a high-power radio signal and observing the scattered signal, Woodman et al. (2019). VHF radars can generate daily images, however, real-time monitoring of these often times noisy or distorted images require careful data cleaning and efficient classification system.

Recently, artificial intelligence (AI), particularly, machine learning algorithms have been applied in space weather forecasting and prediction, for example, Atabati et al. (2021); Razin et al. (2021); Tang (2022). In Tang (2022), the authors proposed deep learning techniques for forecasting ionospheric total electron content (TEC). The proposed model is based on a combination of a convolutional neural network (CNN) for feature map as well as rotation of data to try to expand its outstanding features, long-short term memory (LSTM) neural network, and attention mechanism. The model uses data from 24 GNSS stations in China and is driven by six parameters, including TEC time series, Bz, Kp, Dst, F10.7 indices, and hour of day (HD), In Atabati et al. (2021), the authors used an artificial neural network (ANN) integrated with the genetic algorithm (GA) to predict ionospheric scintillation for the GUAM station. However, the feature extractions from the data can enhance the accuracy and processing time for real-time prediction systems. Singular value decomposition (SVD) is a powerful tool for decomposing images into their constituent parts and analyzing the structure of the image. Its ability to reduce the dimensionality of the image and extract important features makes it useful for a wide range of image processing tasks, including compression, denoising, and feature extraction. Recently, Razin et al. (2021) proposed a new method for modeling the spatio-temporal variations in the ionosphere’s total electron content (TEC) during periods of intense solar activity. The approach utilizes a support vector machine (SVM) as the modeling tool to predict TEC values across the ionosphere. By leveraging the SVM’s ability to handle complex and high-dimensional datasets, the method can effectively capture the complex relationships between TEC values and the various geophysical factors that influence them.

In this work, we propose equatorial plasma bubble classification models using machine learning techniques on the quick look (QL) plot images from the Chumphon VHF radar station. The proposed models classify the presence or absence of plasma bubbles on the images using two different approaches: combined convolutional neural networks (CNNs) and support vector machines (SVM), as well as combined singular value decomposition (SVD) methods and SVM. In the models, SVD and CNNs are used to extract the features from the images before sending them to the SVM for classification.

Data and methodology

In this work, we consider using a support vector machine (SVM) for classification, and then we use singular value decomposition (SVD) and a convolutional neural network (CNN) for feature extraction and size reduction.

Support vector machine

Support vector machine (SVM) is a linear model for classification and regression, introduced by Vapnik et al. (1963). It is capable of handling both linearly separable and non-linearly separable data through the use of a kernel trick. The algorithm creates a decision boundary, called “a hyperplane”, that separates the positive and negative classes by maximizing the margin and reducing classification errors. The margin is the distance between the hyperplane and the closest data points from each class, known as “support vectors”. Figure 1 shows the illustration of a supporting vector apparatus to summarize the optimal separating hyperplane in the linearly isolated data.

From Fig. 1, for 2 types of scattering data vectors, the SVM algorithm seeks to find the weight vector, $\mathbf{w}$, and bias, b, that define the optimal hyperplane separating the classes with the maximum margin m. Given a set of training samples, ${\mathbf{x}}_{{\text{i}}} \in R^{{\text{n}}}$, and their corresponding classes, ${y}_{i }=\pm 1,$ the distance between a sample set ${\mathbf{x}}_{i}$ and the hyperplane is given by the expression ${(\mathbf{w}}^{\mathrm{T}}\cdot {\mathbf{x}}_{i}+b)$. In SVM, the margin ($m$) width must be maximized according to

$$\mathrm{max} \frac{2}{\Vert \mathbf{w}\Vert }.$$

(1)

The objective function of the SVM can be written in the Lagrangian formula as:

$$\left( {{\mathbf{w}},b} \right) = {\text{min}}\frac{1}{2}\left\| {\mathbf{w}} \right\|^{2} - C\mathop \sum \limits_{i = 1}^{N} { }\xi_{i} ,$$

(2)

subject to ${y}_{i}{(\mathbf{w}}^{\mathrm{T}}\cdot {\mathbf{x}}_{i}+b)-1+{\xi }_{i}\ge 0,$ where ${\xi }_{i}\ge 0,$ where $C$ is the penalty parameter and ${\alpha }_{i}$ are the Lagrange multipliers.

For non-linearly separable data, nonlinear kernel functions are used to map the data into higher-dimensional feature spaces where the data can be linearly separated, Dhafar et al. (2020). A kernel function $K({\mathbf{x}}_{i}, {\mathbf{y}}_{i})$, is defined as the dot product of a nonlinear function $\phi$,

$$K\left({\mathbf{x}}_{i}, {\mathbf{y}}_{i}\right)=\phi {\left({\mathbf{x}}_{i}\right)}^{\mathrm{T}}\phi \left({\mathbf{y}}_{i}\right),$$

(3)

where $\phi$ is a mapping of $X$ to a feature space F.

The SVM is a versatile model that can tackle various machine learning problems by utilizing multiple kernel functions, each with its unique characteristics. Among these kernels, the radial basis function (RBF) kernel is notable for its reliance on the gamma parameter, which determines the shape and complexity of the decision boundary. We use two kernel functions in SVM: the polynomial kernel and radial basis function kernel, Zhang et al. (2012). These kernels offer user-definable parameters, such as gamma ($\gamma$), degree ($d$), and penalty parameter ($C$), which can be adjusted to achieve the best performance for a particular problem. The gamma parameter plays a crucial role in determining the influence of each training example on the decision boundary, while the degree parameter controls the degree of the polynomial kernel. The penalty parameter $C$ balances the tradeoff between margin maximization and classification error minimization. To achieve optimal SVM performance, it is crucial to select an appropriate kernel function and adjust its associated parameters. However, the selection process can be problem-dependent and may require careful experimentation and tuning, often using cross-validation techniques.

There are four types of kernel functions:

1.
Linear kernel: $K({{\varvec{x}}}_{\mathbf{i}}, {{\varvec{y}}}_{\mathbf{i}}) =\boldsymbol{ }{{{\varvec{x}}}_{\mathbf{i}}}^{\mathbf{T}}\boldsymbol{ }{{\varvec{y}}}_{\mathbf{i}}$,
2.
Polynomial kernel: $K({{\varvec{x}}}_{\mathbf{i}}, {{\varvec{y}}}_{\mathbf{i}}) = {({{\varvec{x}}}_{\mathbf{i}}\cdot {{\varvec{y}}}_{\mathbf{i}}+C)}^{\mathrm{d}}$,
3.
Radial basis function kernel: $K({{\varvec{x}}}_{\mathbf{i}}, {{\varvec{y}}}_{\mathbf{i}}) = \mathrm{exp}\left(\frac{{-\gamma |\boldsymbol{ }{{\varvec{x}}}_{\mathbf{i}}\cdot {{\varvec{y}}}_{\mathbf{i}}|}^{2}}{2{\sigma }^{2}}\right)$,
4.
Sigmoid kernel: $K\left({{\varvec{x}}}_{\mathbf{i}}, {{\varvec{y}}}_{\mathbf{i}}\right)= \mathrm{tanh}\left(-\gamma \boldsymbol{ }\left({{\varvec{x}}}_{\mathbf{i}}\cdot {{\varvec{y}}}_{\mathbf{i}}\right)+ C\right).$

In this work, we consider polynomial and RBF kernel functions. As a result, the RBF kernel can capture more complex nonlinear relationships between the features than the polynomial kernel. Another difference is the sensitivity to the hyperparameters. The polynomial kernel is sensitive to the degree of the polynomial, which determines the complexity of the mapping. In contrast, the RBF kernel is sensitive to the gamma parameter, which determines the width of the Gaussian function used in the mapping.

Convolutional neural network

In LeCun et al. (1990), the authors introduced the concept of convolutional computational methods into neural networks to compute image features. The layer that computes features using convolution is called the convolutional layer. LeCun et al. (1998) presented a convolutional neural network (CNN) called LeNet-5, which introduced the concept of a fully connected layer. This layer acts as a layer of a multi-layer neural network and allows the CNN to perform both feature extraction and classification, which is considered a key advantage of this network architecture.

The convolutional neural network architecture, as shown in Fig. 2, has four important layers:

1
Convolutional layer

The convolution layer is the basic component in convolutional neural networks. It is composed of multiple feature surfaces (feature maps), each of which is composed of many neurons. The neurons are connected by the convolution kernel to the local region of the upper feature surfaces. The convolution layer of CNN can extract different features of the input by the convolution operation. By increasing the depth of the convolution layers, more advanced features can be extracted. Representing the input image by ‘‘$I$’’, and the two-dimensional convolution kernel by ‘‘$K$’’; the convolution of the input image is

$$C\left(i,j\right)=\left(I*k\right)\left(i.j\right)=\sum_{m}\sum_{n}I\left(m,n\right)K\left(i-m,j-n\right).$$

(4)

2
Pooling layer

The pooling layer is an important component of a convolutional neural network (CNN) that typically follows a convolutional layer. Its primary purpose is to downsample the feature map obtained from the convolutional layer, reducing its spatial dimensions while preserving the important features. Another important function of the pooling layer is to extract distinctive features that are invariant to translation, rotation, and scaling, thereby improving the CNN’s predictive capabilities. Common pooling methods include Average Pooling, Min Pooling, and Max Pooling, which involve dividing the feature map into local regions and computing a summary statistic such as the mean, minimum, or maximum of each region.

3
Fully connected layer

A fully connected layer serves as the connection between the feature map and the final output. The feature map is flattened, reshaping every neuron in the last layer to act as input to the next layer. The flattened input is then multiplied by randomly generated weights between 0 and 1, with bias sometimes added. The resulting value of each neuron is then passed through a chosen activation function to obtain the result of each neuron. The hidden layer can be designed as needed, and the output layer is obtained from the last hidden layer represented in the activation function. Each neuron in the output layer is substituted into the activation function to predict the probability of the output (S), i.e.,

$$S=\sum_{i=1}^{n}{p}_{i}{w}_{i},$$

(5)

where ${p}_{i}$ is the input data and ${w}_{i}$ is the weight.

The CNNs are highly efficient for feature extraction in image classification applications because there is a combination of capturing specific patterns and hierarchical feature learning parameter sharing nonlinear modeling. Moreover, the training is effective and is suitable for feature extraction of images.

Singular value decomposition

Singular value decomposition (SVD) is a matrix factorization technique that has found wide use in many fields, from engineering and physics to data analysis and machine learning. In the field of image and signal processing, the SVD is often used to reduce the dimensionality of data by identifying the underlying structure and extracting its most important features. This can be particularly useful for compressing large amounts of data without significant loss of information. In image compression, the SVD is used to decompose an image into its most important singular values and vectors. The singular values represent the importance of each vector in the decomposition and can be used to discard less significant information while retaining the essential features of the image. This can lead to a significant reduction in the number of pixels required to represent the image, without significant loss of image quality. This approach involves a rectangular matrix with dimensions m by n, which is decomposed into three-valued products of other matrices. The classification model uses the singular values from the SVD process as inputs for both training and testing of the model. The decomposition takes the form

$$\mathbf{A}=\mathbf{U}{\varvec{\Sigma}}{\mathbf{V}}^{\mathrm{T}},$$

(6)

where $\mathbf{A}$ is the $m\times n$ data matrix that must be divided into sections, $\mathbf{U}$ is an orthonormal matrix of the size $m\times m$, ${\varvec{\Sigma}}$ is a diagonal matrix of the size $m\times n$ and ${\mathbf{V}}^{\mathrm{T}}$ is the transpose of the orthonormal matrix of the size $n\times n$.

By using the SVD, we can break down the large image matrix into a smaller set of matrices and remove the smaller singular values to reduce memory usage. This approach can greatly benefit the efficiency of computations and the storage requirements of these applications.

VHF radar data and images

In order to investigate plasma bubbles near the magnetic equator region, the National Institute of Information and Communications Technology (NICT) and King Mongkut's Institute of Technology Ladkrabang (KMITL) collaborated to install the Prachomklao VHF radar station at the KMITL Chumphon campus (Geographic: 10.72° N, 99.73° E and Geomagnetic: 1.33° N) in Thailand on January 17, 2020. This radar station was set up to monitor plasma bubbles that freshly form in the ionosphere. As depicted in Fig. 3, the VHF radar station includes three-element 18 yagi antennas that are arranged from east to west with a distance of approximately 5 m between each antenna. The radar system transmits VHF frequencies (39.65 MHz) through the atmosphere. The essential parameters of the VHF radar system are described in Table 1.

Table 1 Specification of the VHF radar of Chumphon station, Thailand

Full size table

Two types of radar images are available: quick look (QL) images and range–altitude–time intensity (RATI).

When the radar signal intensity (or power) is plotted against time and altitude, a quick look plot image is generated, which can be used to study plasma bubbles. The distance between the radar antenna and the reflecting structure is commonly referred to as the range, but technically, it indicates the altitude. The time axis represents the time elapsed since the radar signal was transmitted before it was received by the receiving device. The quick look plot image provides a visual representation of the data collected by the radar in real-time. The collected data are plotted in a quick look (QL) plot. Figure 4a–c shows examples of three types of radar images: non-plasma bubble, unsure and plasma bubble.

The unsure images in Fig. 4b are those that contain objects which cannot be identified as EPB or due to system errors. To prepare images for the SVM model training, the first step is to crop the image to remove all areas where the data repeat over time. The resulting image is an RGB image with dimensions of 360 × 360 × 3 pixels. In this study, a total of 1000 images are used, with 700 images utilized as training datasets and 300 images as testing datasets. The data classes used in the model testing are non-plasma bubbles, unsure, and plasma bubbles. As for the image data used for each class, they are divided into 350 images for the class without plasma bubbles, 350 images for the unsure class, and 300 images for the class with plasma bubbles.

Proposed methods

In the proposed methods, we consider the SVM model for EPB classification using VHF radar image data. In order to reduce the input size and improve the classification performances, two feature extraction techniques: the SVD and CNN are proposed to use before the SVM. This section describes in detail the design of each method of the model used in this research.

Support vector machine model

In the research, the SVM model with the RBF and polynomial kernels are designed as illustrated in Fig. 5. For each kernel, two main parameters: $C$ and gamma need to be defined to achieve high performance. The parameter $C$ in SVM is a hyperparameter that governs the tradeoff between maximizing the margin (that is, the distance between the decision border and the data points that are the closest to it) and reducing the classification error. The gamma parameter determines the influence of each training example on the decision boundary and the margin. Gamma is used to regulate the width of the Gaussian kernel in support vector machines (SVMs) that employ a radial basis function (RBF) as their kernel. This kernel is used to compute the similarity between training instances. To compare the performance of the kernels, we set each kernel using $C= \{\mathrm{0.01,1},\mathrm{10,100}\}$ and $\mathrm{gamma} = \{\mathrm{0.01,1},\mathrm{10,100}\}$. The gamma value and parameter $C$ for each kernel are optimized using the grid search. After the values of $C$ and gamma are achieved, we modify the SVM system by adding a feature extracting technique: the CNN and SVD.

Proposed combined singular value decomposition and support vector machine (SVD-SVM)

The SVD can be used with SVM to improve the performance of classification tasks by reducing the dimensionality of the input data while retaining the most important features. The architecture of the proposed SVD-SVM model is presented in Fig. 6. In the experiment, we employ four components with singular values with various numbers determining by a discrete approach. In both training and testing processes for the proposed model, we use the SVD technique to extract features from images before sending to the SVM system for classification of the presence or absence of EPB in the images. The SVD feature extraction converts the image into a matrix by dividing the image into a color matrix into a grayscale image by averaging the RGB channel. The grayscale image is a matrix where each element represents a pixel intensity value. The size of the matrix will be the height and width of the input image. In feature extracting process, the SVD process input image matrix to generate its principal components (singular values) and vectors of its outputs (singular vectors). The principal components selectively store only the essential parts of the singular values for use in further processes. In general, higher values are collected as input data for the next classification stage without significant loss, they represent more important features from the image. After feature extraction from the SVD is completed, the acquired features are used in the SVM to classify or recognize that object in both training and testing processes. In the model, we consider the size of singular values, $N = \{5, 100, 200, 360\}$ for SVD and we set each kernels using $C = \{0.01, 1, 10, 100\}$ for SVM.

Proposed combined convolutional neural network and support vector machine (CNN-SVM)

The combination of SVMs and CNNs is a powerful technique that leverages the strengths of both models. The CNN is used to extract the distinctive features of the data, where the number of convolution layers, the size of the filter, and the activation function are determined. In this research, we experiment with three filter sizes—$3\times 3$, $5\times 5$, and $7\times 7$, and use the ReLU activation function. Additionally, max pooling is used in the pooling layer to further reduce the dimensionality of the features extracted by the CNN. The output of the CNN is then passed to the SVM model, where the model is trained to classify the images. Two kernels are used in this research—the radial basis function (RBF) kernel and the polynomial kernel. The parameters $C$ and $gamma$ are then set for each kernel, with the degree of the polynomial kernel also being determined. This ensures that the SVM model is optimized to achieve the best possible performance. Using CNNs for features extraction helps to reduce the amount of training data required, while also improving the overall performance of the SVM model. This is because the CNN can identify the most important features of the images, which are then used by the SVM to classify the images more accurately. To further optimize the performance of the model, a grid search is performed to identify the best values of $C$ and gamma for each kernel, which can lead to significant improvements in the accuracy of the model. In this study, we use a filter size of $3\times 3$. To preserve the essential image features, we employ max pooling to reduce the size of the image from the convolution layer. We set the window stride for the filter to $1\times 1$ across all convolution layers. Secondly, we train the data from the last convolution layer using SVM. The CNN and SVM network architectures are presented in Fig. 7.

In this paper, the prediction accuracy of the proposed models is computed using the following equation:

$$\mathrm{Accuracy}= \frac{\mathrm{TP}+\mathrm{TN}}{\mathrm{TP}+\mathrm{TN}+\mathrm{FP}+\mathrm{FN}}.$$

(7)

Receiver operating characteristic (ROC) illustrates the proportional relationship between correct and incorrect predictions. This is an additional measure of the efficacy of the predictive model. The equation for the graph where the vertical axis is the true positive rate (TPR) or recall value and the horizontal axis is the false positive rate (FPR) value is as follows:

$$\mathrm{TPR}= \frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FN}},$$

(8)

$$\mathrm{FPR}= \frac{\mathrm{FP}}{\mathrm{FP}+\mathrm{TN}}.$$

(9)

Here, there terms are defined as follows:

1.
True negative (TN) represents the situation in which a prediction is negative, and the actual outcome is also negative.
2.
False negative (FN) represents the situation in which the prediction is negative, and the actual outcome is positive.
3.
False positive (FP) represents the situation in which the predicted is positive, and the actual outcome is negative.
4.
True positive (TP) represents the situation in which the prediction is positive and the actual outcome is positive.

The values of TN, TP, FN, and FP are calculated based on the actual outcome and the prediction as illustrated in the confusion matrix in Fig. 8.

Results and discussion

Based on data from the Prachomklao VHF radar station at the KMITL Chumphon campus, Thailand, we select the observation images from October 1, 2020, to October 15, 2020. Out of 1000 images, 700 images are used as the training dataset, representing 70% of the total, and 300 images are used for testing the model, representing the remaining 30 percent. Three data classes are labeled: Class 1 (non-plasma bubbles), Class 2 (unsure), and Class 3 (plasma bubbles). Traditionally, the SVM is used for binary classification, however, to extend the use to 3 classes, we utilize the one-versus-all (OvA) or one-versus-rest (OvR) technique, which enables SVMs to manage multi-class classification tasks. This approach divides the multi-class problem into multiple binary classification problems, where each class is identified against a group of other classes. By training separate classifiers for each class, this technique is more efficient and enhances accuracy across multi-class scenarios. Below are the findings from each experimental method. Three types of classification models are studied including the SVM only, the SVD-SVM and the CNN-SVM models.

SVM classification model

In this model, the raw VHF images are fed directly to the SVM model to be considered as a baseline method. We use two kernels, RBF kernel and polynomial kernel, for plasma bubble classification. Table 2 shows the classification accuracies of the SVM using RBF kernel with various sets of $C$ and gamma values. The maximum accuracy of 86.67 percent is achieved when the parameter $C$ is set to 10 and 100 and the gamma equals 0.01. It can be seen that the accuracy performances of the models using $C$ = 10 and $C$ = 100 are the same, but the processing time of the model with $C$ = 10 was faster than that using $C$ = 100.

Table 2 The accuracy of SVM system with RBF kernel with various gamma values

Full size table

The degree of the polynomial kernel plays a crucial role in determining the degree of the polynomial function utilized for mapping the input features to a higher-dimensional feature space. In this study, we set the degree to $d = \{\mathrm{0.01,1},\mathrm{10,100}\}$ and utilize the same set of $C$ parameters for the polynomial kernel as presented in Table 3. From the experimental results, when setting the $C$ parameter with the degree value, it shows that when the $C$ parameter is 0.01 and the degree is 1, the model has the highest accuracy of 82.33 percent because lower $\mathrm{degree}$ values create easier decision boundaries and less prone to overfitting.

Table 3 The accuracy of SVM system with polynomial kernel at various degrees

Full size table

SVD-SVM classification model

Singular value decomposition (SVD) with support vector machines (SVMs) can be especially advantageous when dealing with high-dimensional data that possess a large number of features. In such scenarios, traditional SVMs may encounter issues of overfitting and poor generalization performance. However, by utilizing SVD to reduce the dimensionality of the data and extract the most significant features. In Fig. 9, we can see that as more singular values are included in the image matrix, the clarity of the image improves. The original image has approximately 360 non-zero singular values, but we are able to see a close resemblance to the original image using only 200 singular values.

The experimental results for separating different image components are presented in Table 4. In this study, we also use two kernels: the RBF kernel and the Polynomial kernel. The RBF kernel is assigned the values of $C$ and gamma parameters as 10 and 0.1, respectively, while the Polynomial kernel uses the degree and $C$ parameter values of 1 and 0.01, respectively. The results are presented in Table 4 where N represents the number of singular values used for classification.

Table 4 The accuracy of the SVD-SVM system with each kernel

Full size table

Based on the results in Table 4, the processing time required for the models varies depending on the image decomposition techniques used, while the improvement in accuracy is not significant after the size of components (N) is increased more. The model using RBF kernel with N = 360 achieves highest accuracy, as we can see in the Table 4. The SVD-SVM model is capable of adjusting the number of components used for processing by utilizing the singular values obtained from SVD.

CNN-SVM classification model

By combining the unique CNN method to extract image features and training with SVM, we achieve higher accuracy and improve categorization of plasma bubbles. Our findings indicate that models based on the CNN-SVM combination outperform those based solely on the SVM technique.

In this work, we investigate the impact of filter sizes ($3\times 3$, $5\times 5$, and $7\times 7$) and the number of feature extraction layers on the performances of the combined SVM-CNN model. We employe a $2\times 2$ dimension for the pooling layer, and the SVM is trained on the final layer of image features. The RBF and polynomial kernels are utilized in the SVM model. A convolution layer consists of one to four layers, with a filter stride size of $1\times 1$. Finally, a $2\times 2$ max pooling layer is used to fine-tune the properties of the extracted image features.

Tables 5 and 6 present the experimental results of the CNN-SVM model using the RBF and polynomial kernels, with different filter sizes and number of convolution layers. Based on the results, when the filter size was set to $3\times 3$ and the number of convolution layers was set to 7, and the model using the RBF kernel achieves the highest accuracy of 93.67%. Similarly, for the polynomial kernel model with the same filter size and number of the convolution layers, the results show that the model achieves the highest accuracy of 92.33%.

Table 5 The accuracy of CNN-SVM system (RBF kernel) with various filter sizes

Full size table

Table 6 The accuracy of CNN-SVM system (polynomial kernel) with various filter sizes

Full size table

While Tables 6 and 7 present results based on a maximum of 7 convolution layers in CNN model, we explore models with more layers. However, we observe only a marginal improvement in accuracy at the expense of significantly increased latency. Specifically, when the number of layers is increased to 9, the accuracy of the model improved by approximately 0.02 to 0.05 percent. Therefore, we determine to use 7 convolution layers for our models.

Table 7 Accuracy of each model (after testing) with different kernels

Full size table

Performance comparison of the proposed EPB classification models

Finally, we analyze the performances of all three models as shown in Table 7. According to the table, the CNN-SVM model provides the highest accuracy of 93.67% when using the RBF kernel. The accuracies of all model with different kernels are compared in Fig. 10. The parameters of each kernel are similar to those used in the previous section.

Figures 11, 12, 13 display the confusion matrix results obtained from each proposed model. The classes are referred to as follows: class 0 (non-plasma bubbles), class 1 (unsure), and class 2 (plasma bubble). The CNN-SVM model yields a higher accuracy at 28.67% compared to the SVM model and the SVD-SVM model, respectively.

Figures 14, 15, 16 display the receiver operating characteristic (ROC) curves for each model, each point on the graph is a pair of true positive rate (TPR) and false positive rate (FPR) values for a specific threshold point. Therefore, the area under the ROC curve (AUC-ROC) represents the differences in plasma bubble classification. The results indicate that the CNN-SVM model outperforms the SVM model in terms of AUC-ROC for all three classes. Specifically, the CNN-SVM model achieved a 1% increase in AUC-ROC for class 0, a 14% increase for class 1, and a 12% increase for class 2, compared to the SVM model. AUC-ROC is a commonly used metric for evaluating binary classification models, while SVM and CNN are two popular machine learning models for classification tasks.

Conclusions

In this work, we propose two classification models using machine learning techniques to identify the presence or absence of EPBs in quick look (QL) plot images from the Chumphon VHF radar station. The models are developed with two separate approaches: a combined CNN and SVM classification technique, and a combined SVD and SVM techniques. By using SVD and CNN, the models effectively extract import features, resulting to a reduction in input data size for SVM. This reduction in data size not only accelerates the training and testing time of the model, but also maintaines a high level of accuracy in detecting the presence of plasma bubble in VHF radar’s QL plot images. Our CNN models, incorporating 7 convolutional layers, and SVD with 360 singular values, demonstrated substantial performance improvements. The SVM alone model is also considered as a baseline method. The experimental results show the outperformance of the CNN-SVM models over the other two approaches. Specifically, the combined CNN-SVM model, using the RBF kernel, achieves the highest accuracy of 93.08%, while the model using the polynomial kernel achieved an accuracy of 92.14%. On the other hand, the combined SVD-SVM models yield the accuracies of 88.37% while requiring less processing time. Therefore, the proposed models can be used for real-time detection and classification of EPBs, which is crucial for the space weather community. In future works, we will explore the use of other AI techniques for EPB characterization.

Availability of data and materials

The research materials and VHF radar data are mainly supported by the National Institute of Information and Communications Technology (NICT), Japan.

Abbreviations

AI:: Artificial intelligence
CPN:: Chumphon
EPB:: Equatorial plasma bubble
SVD:: Singular value decomposition
SVM:: Support vector machine
CNN:: Convolutional neural network
NICT:: National Institute of Information and Communications Technology
KMITL:: King Mongkut’s Institute of Technology Ladkrabang
QL:: Quick look
RBF:: Radial basis function
ROC:: Receiver operating characteristic
TPR:: True positive rate
FPR:: False positive rate

References

Abadi P, Saito S, Srigutomo W (2014) Low-latitude scintillation occurrences around the equatorial anomaly crest over Indonesia. Ann Geophys 32:7–17. https://doi.org/10.5194/angeo-32-7-2014
Article Google Scholar
Atabati M, Alizadeh HS, Tsai LC (2021) Ionospheric scintillation prediction on S4 and ROTI parameters using artificial neural network and genetic algorithm. Remote Sens 13:2092. https://doi.org/10.3390/rs13112092
Article Google Scholar
Chendong L, Craig MH, Sreeja VV, Dongsheng Z, João F, Galera M, Nicholas ASH (2022) Distinguishing ionospheric scintillation from multipath in GNSS signals using geodetic receivers. GPS Solut. https://doi.org/10.1007/s10291-022-01328-x
Article Google Scholar
Cochin V, Forget P, Seille B, Mercier G (2005) Sea surface currents and wind direction by VHF radar: results and validation. Eur Oceans. https://doi.org/10.1109/OCEANSE.2005.1513183
Article Google Scholar
de Moraes AD, Vani BC, Costa E, Abdu MA, de Paula ER, Sousasantos J, Monico JF, Forte B, de Negreti PMS, Shimabukuro MH (2018) GPS availability and positioning issues when the signal paths are aligned with ionospheric plasma bubbles. GPS Solut. https://doi.org/10.1007/s10291-018-0760-8
Article Google Scholar
Deepak KK, Richard WE, Robert ED, Carlos RM, William EM (2023) GOLD mission’s observation about the geomagnetic storm effects on the nighttime equatorial ionization anomaly (EIA) and equatorial plasma bubbles (EPB) during a solar minimum equinox. Space Weather. https://doi.org/10.1029/2022SW003321
Article Google Scholar
Dhafar HA, Ahmed TS, Ayad RA (2020) Classifying political arabic articles using support vector machine with different feature extraction. ACRIT 1174:79–94. https://doi.org/10.1007/978-3-030-38752-5_7
Article Google Scholar
Huba JD, Joyce G, Krall J (2008) Three-dimensional equatorial spread F modeling. Geophys Res Lett 35:10102. https://doi.org/10.1029/2008GL033509
Article Google Scholar
Kelly MC (2009) The earth’s ionosphere: plasma physics and electrodynamics, 2nd edn. Academic Press, San Diego
Google Scholar
LeCun Y, Bengio Y, Bottou L (1998) Gradient-based learning applied to document recognition. Proc IEEE. https://doi.org/10.1029/2006RS003512
Article Google Scholar
LeCun Y, Matan O, Boser B, Denker JS, Henderson D, Howard RE (1990) Handwritten Zip Code Recognition with multilayer networks. In: International conference on pattern recognition, pp 35–40
Nakata H, Nagashima I, Sakata K, Otsuka Y, Akaike Y, Takano T, Shimakura S, Shiokawa K, Ogawa T (2005) Observations of equatorial plasma bubbles using broadcast VHF radio waves. Geophys Res Lett 32(17). https://doi.org/10.1029/2005GL023243
Otsuka Y, Ogawa T, Effendy. (2009) VHF radar observations of nighttime F-region field-aligned irregularities over Kototabang, Indonesia. Earth, Planets Space 61:431–437. https://doi.org/10.1186/BF03353159
Article Google Scholar
Pavan Chaitanya P, Patra AK, Otsuka Y, Yokoyama T, Yamamoto M, Stoneback RA, Heelis RA (2017) Daytime zonal drifts in the ionospheric 150 km and E regions estimated using EAR observations. J Geophys Res. https://doi.org/10.1002/2017JA024589
Article Google Scholar
Razin MRG, Moradi AR, Inyurt S (2021) Spatio-temporal analysis of TEC during solar activity periods using support vector machine. GPS Solut 25:121. https://doi.org/10.1007/s10291-021-01158-3
Article Google Scholar
Tang J, Li Y, Ding M, Liu H, Yang D, Wu X (2022) An ionospheric TEC forecasting model based on a CNN-LSTM-attention mechanism neural network. Remote Sens 14:2433. https://doi.org/10.3390/rs14102433
Article Google Scholar
Tsunoda RT, Livingston RC, McClure JP, Hanson WB (1982) Equatorial plasma bubbles: Vertically elongated wedges from the bottomside F layer. J Geophys Res 87:9171–9180. https://doi.org/10.1029/JA087iA11p09171
Article Google Scholar
Vapnik V, Lerner A (1963) Pattern recognition using generalized portrait method. Autom Remote Control 24:774–780
Google Scholar
Wei L, Jiang C, Hu Y, Aa E, Huang W, Liu J, Yang G, Zhao Z (2021) Ionosonde observations of spread F and spread Es at low and middle latitudes during the recovery phase of the 7–9 September 2017 geomagnetic storm. Remote Sens. https://doi.org/10.3390/rs13051010
Article Google Scholar
Wernik AW, Alfonsi L, Materassi M (2007) Scintillation modeling using in situ data. Radio Sci. https://doi.org/10.1029/2006RS003512
Article Google Scholar
Woodman RF, Farley DT, Balsley B, Milla M (2019) The early history of the jicamarca radio observatory and the incoherent scatter technique. Hist Geo Space Sci 10(2):245–266. https://doi.org/10.5194/hgss-10-245-2019
Article Google Scholar
Zhang Y, Wu L (2012) Classification of fruits using computer vision and a multiclass support vector machine. Sensors 12:12489–12505. https://doi.org/10.3390/s120912489
Article Google Scholar

Download references

Acknowledgements

The ASEAN IVO (http://www.nict.go.jp/en/asean_ivo/index.html) project, Precise positioning and Artificial Intelligence (AI) for Ionospheric Disturbances in Low-Latitude Region in ASEAN, was involved in the production of the contents of this publication and financially supported by NICT (http://www.nict.go.jp/en/index.html). We thank the National Institute of Information and Communications Technology (NICT), Japan for providing the VHF radar data.

Funding

This work was financially supported by King Mongkut’s Institute of Technology Ladkrabang [2566-02-01-037] and the NSRF via the Program Management Unit for the Human Resources and Institutional Development, Research and Innovation (Grant no. B05F640197 and B39G660029).

Author information

Authors and Affiliations

School of Engineering, King Mongkut’s Institute of Technology Ladkrabang, Bangkok, 10520, Thailand
Thananphat Thanakulketsarat, Pornchai Supnithi & Lin Min Min Myint
National Institute of Information and Communications Technology, Koganei, Tokyo, 184-8795, Japan
Kornyanat Hozumi & Michi Nishioka

Authors

Thananphat Thanakulketsarat
View author publications
You can also search for this author in PubMed Google Scholar
Pornchai Supnithi
View author publications
You can also search for this author in PubMed Google Scholar
Lin Min Min Myint
View author publications
You can also search for this author in PubMed Google Scholar
Kornyanat Hozumi
View author publications
You can also search for this author in PubMed Google Scholar
Michi Nishioka
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Data collection and experimental designs were implemented by TH. The first draft of manuscript was written by TH and LM. Completing the manuscript modifications by PS. KH and MN reviewed the manuscript. PS provided the research funding support to this work. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Pornchai Supnithi.

Ethics declarations

Competing interests

Authors declared that there are no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Thanakulketsarat, T., Supnithi, P., Myint, L.M.M. et al. Classification of the equatorial plasma bubbles using convolutional neural network and support vector machine techniques. Earth Planets Space 75, 161 (2023). https://doi.org/10.1186/s40623-023-01903-7

Download citation

Received: 02 June 2023
Accepted: 12 September 2023
Published: 16 October 2023
DOI: https://doi.org/10.1186/s40623-023-01903-7

Classification of the equatorial plasma bubbles using convolutional neural network and support vector machine techniques

Abstract

Graphical Abstract

Introduction

Data and methodology

Support vector machine

Convolutional neural network

Singular value decomposition

VHF radar data and images

Proposed methods

Support vector machine model

Proposed combined singular value decomposition and support vector machine (SVD-SVM)

Proposed combined convolutional neural network and support vector machine (CNN-SVM)

Results and discussion

SVM classification model

SVD-SVM classification model

CNN-SVM classification model

Performance comparison of the proposed EPB classification models

Conclusions

Availability of data and materials

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords