Skip to main content

Classification of the equatorial plasma bubbles using convolutional neural network and support vector machine techniques

Abstract

Equatorial plasma bubble (EPB) is a phenomenon characterized by depletions in ionospheric plasma density being formed during post-sunset hours. The ionospheric irregularities can lead to disruptions in trans-ionospheric radio systems, navigation systems and satellite communications. Real-time detection and classification of EPBs are crucial for the space weather community. Since 2020, the Prachomklao radar station, a very high frequency (VHF) radar station, has been installed at Chumphon station (Geographic: 10.72° N, 99.73° E and Geomagnetic: 1.33° N) and started to produce radar images ever since. In this work, we propose two real-time plasma bubble detection systems based on support vector machine techniques. Two designs are made with the convolutional neural network (CNN) and singular value decomposition (SVD) used for feature extraction, the connected to the support vector machine (SVM) for EPB classification. The proposed models are trained using quick look (QL) plot images from the VHF radar system at the Chumphon station, Thailand, in 2017. The experimental results show that the combined CNN-SVM model, using the RBF kernel, achieves the highest accuracy of 93.08% while the model using the polynomial kernel achieved an accuracy of 92.14%. On the other hand, the combined SVD-SVM models yield the accuracies of 88.37% and 85.00% for RBF and polynomial kernels of SVM, respectively.

Graphical Abstract

Introduction

Equatorial plasma bubbles (EPBs) refer to the plasma depletion region in the ionosphere. They typically originate at the bottom-side of the F-layer after sunset, particularly, near the magnetic equator, forming bubble-shaped structures, Abadi et al. (2014). The initiation mechanism of the post-sunset EPBs are known to be caused by the Rayleigh–Taylor instability, Kelly (2009). Global-scale geomagnetic storms can also further increase the severities and occurrences of EPBs, Deepak et al. (2023). The scale size of the EPBs ranges from a few kilometers to several hundred kilometers. The EPBs typically extend in higher latitudes via the magnetic flux tube (Huba 2008), which also reach higher altitudes along the magnetic field lines. The plasma bubbles can cause signal fading and scintillation, which can lead to degradation in communication, positioning and navigation, Alison et al. (2018).

To observe EPBs, since the 1960s, researchers have been using remote-sensing based on radio waves to investigate ionospheric irregularities such as equatorial plasma bubbles (EPBs). Ionosondes, all-sky airglow imagers, Global Navigation Satellite System (GNSS) receivers, in situ satellites, very high frequency (VHF) radar stations, equatorial atmosphere radar (EAR), and incoherent scatter radar (ISR) are some of the techniques that have been widely utilized. For example, the ionosonde system measures the electron density profile of the ionosphere by sending a radio signal upward and receiving the echo signals in return, Wei et al. (2021). Global Navigation Satellite Systems (GNSS) receivers are also utilized to detect the ionospheric irregularity through the analysis of pseudo range information, Chendong et al. (2022). The parameters of the plasma in the ionosphere can be directly measured by in situ satellites, Wernik et al. (2007). These properties include the temperature, electron density and drift velocity.

Very high frequency (VHF) radar is another powerful tool for observing plasma bubbles in the ionosphere. Compared to other observation methods, VHF radar offers several advantages. It provides high spatial resolution (on the order of meters) and high temporal resolution (on the order of seconds), allowing for detailed observations of plasma bubble evolution. VHF radar can also cover a large portion of the ionosphere, making it possible to study the large-scale structure and dynamics of plasma bubbles. Additionally, VHF radar can monitor plasma bubbles in real-time, making it useful for space weather forecasting and studying the effects of plasma bubbles on communication and navigation systems. VHF radar has been applied in various fields, including sea surface current detection and wind direction estimation, Cochin et al. (2005). Importantly, VHF radar useful for space weather forecasting and, particularly, the structures of plasma bubbles. In the ionosphere, VHF radar has been used to detect boundary irregularities in the F-region at night in Kototabang, Indonesia, Otsuka et al. (2009), and to observe plasma bubbles at the bottom-side of the F-layer, Tsunoda et al. (1982).

VHF radars operate in the VHF band between 30 and 300 MHz and they can observe ionospheric irregularities such as EPBs by transmitting a radar signal upward and receiving the signal after it has been scattered by the irregularities, Nakata et al. (2005). The equatorial atmosphere radar, also known as the EAR, is a specialized type of VHF radar that was developed for the purpose of researching the equatorial ionosphere and has been put to extensive use in researching EPBs, Pavan et al. (2017). The incoherent scatter radar at Jicamarca Radio Observatory among others is another powerful tool used to study the ionosphere, including EPBs, by transmitting a high-power radio signal and observing the scattered signal, Woodman et al. (2019). VHF radars can generate daily images, however, real-time monitoring of these often times noisy or distorted images require careful data cleaning and efficient classification system.

Recently, artificial intelligence (AI), particularly, machine learning algorithms have been applied in space weather forecasting and prediction, for example, Atabati et al. (2021); Razin et al. (2021); Tang (2022). In Tang (2022), the authors proposed deep learning techniques for forecasting ionospheric total electron content (TEC). The proposed model is based on a combination of a convolutional neural network (CNN) for feature map as well as rotation of data to try to expand its outstanding features, long-short term memory (LSTM) neural network, and attention mechanism. The model uses data from 24 GNSS stations in China and is driven by six parameters, including TEC time series, Bz, Kp, Dst, F10.7 indices, and hour of day (HD), In Atabati et al. (2021), the authors used an artificial neural network (ANN) integrated with the genetic algorithm (GA) to predict ionospheric scintillation for the GUAM station. However, the feature extractions from the data can enhance the accuracy and processing time for real-time prediction systems. Singular value decomposition (SVD) is a powerful tool for decomposing images into their constituent parts and analyzing the structure of the image. Its ability to reduce the dimensionality of the image and extract important features makes it useful for a wide range of image processing tasks, including compression, denoising, and feature extraction. Recently, Razin et al. (2021) proposed a new method for modeling the spatio-temporal variations in the ionosphere’s total electron content (TEC) during periods of intense solar activity. The approach utilizes a support vector machine (SVM) as the modeling tool to predict TEC values across the ionosphere. By leveraging the SVM’s ability to handle complex and high-dimensional datasets, the method can effectively capture the complex relationships between TEC values and the various geophysical factors that influence them.

In this work, we propose equatorial plasma bubble classification models using machine learning techniques on the quick look (QL) plot images from the Chumphon VHF radar station. The proposed models classify the presence or absence of plasma bubbles on the images using two different approaches: combined convolutional neural networks (CNNs) and support vector machines (SVM), as well as combined singular value decomposition (SVD) methods and SVM. In the models, SVD and CNNs are used to extract the features from the images before sending them to the SVM for classification.

Data and methodology

In this work, we consider using a support vector machine (SVM) for classification, and then we use singular value decomposition (SVD) and a convolutional neural network (CNN) for feature extraction and size reduction.

Support vector machine

Support vector machine (SVM) is a linear model for classification and regression, introduced by Vapnik et al. (1963). It is capable of handling both linearly separable and non-linearly separable data through the use of a kernel trick. The algorithm creates a decision boundary, called “a hyperplane”, that separates the positive and negative classes by maximizing the margin and reducing classification errors. The margin is the distance between the hyperplane and the closest data points from each class, known as “support vectors”. Figure 1 shows the illustration of a supporting vector apparatus to summarize the optimal separating hyperplane in the linearly isolated data.

Fig. 1
figure 1

Illustration of optimal decision boundary in support vector machine

From Fig. 1, for 2 types of scattering data vectors, the SVM algorithm seeks to find the weight vector, \(\mathbf{w}\), and bias, b, that define the optimal hyperplane separating the classes with the maximum margin m. Given a set of training samples, \({\mathbf{x}}_{{\text{i}}} \in R^{{\text{n}}}\), and their corresponding classes, \({y}_{i }=\pm 1,\) the distance between a sample set \({\mathbf{x}}_{i}\) and the hyperplane is given by the expression \({(\mathbf{w}}^{\mathrm{T}}\cdot {\mathbf{x}}_{i}+b)\). In SVM, the margin (\(m\)) width must be maximized according to

$$\mathrm{max} \frac{2}{\Vert \mathbf{w}\Vert }.$$
(1)

The objective function of the SVM can be written in the Lagrangian formula as:

$$\left( {{\mathbf{w}},b} \right) = {\text{min}}\frac{1}{2}\left\| {\mathbf{w}} \right\|^{2} - C\mathop \sum \limits_{i = 1}^{N} { }\xi_{i} ,$$
(2)

subject to \({y}_{i}{(\mathbf{w}}^{\mathrm{T}}\cdot {\mathbf{x}}_{i}+b)-1+{\xi }_{i}\ge 0,\) where \({\xi }_{i}\ge 0,\) where \(C\) is the penalty parameter and \({\alpha }_{i}\) are the Lagrange multipliers.

For non-linearly separable data, nonlinear kernel functions are used to map the data into higher-dimensional feature spaces where the data can be linearly separated, Dhafar et al. (2020). A kernel function \(K({\mathbf{x}}_{i}, {\mathbf{y}}_{i})\), is defined as the dot product of a nonlinear function \(\phi\),

$$K\left({\mathbf{x}}_{i}, {\mathbf{y}}_{i}\right)=\phi {\left({\mathbf{x}}_{i}\right)}^{\mathrm{T}}\phi \left({\mathbf{y}}_{i}\right),$$
(3)

where \(\phi\) is a mapping of \(X\) to a feature space F.

The SVM is a versatile model that can tackle various machine learning problems by utilizing multiple kernel functions, each with its unique characteristics. Among these kernels, the radial basis function (RBF) kernel is notable for its reliance on the gamma parameter, which determines the shape and complexity of the decision boundary. We use two kernel functions in SVM: the polynomial kernel and radial basis function kernel, Zhang et al. (2012). These kernels offer user-definable parameters, such as gamma (\(\gamma\)), degree (\(d\)), and penalty parameter (\(C\)), which can be adjusted to achieve the best performance for a particular problem. The gamma parameter plays a crucial role in determining the influence of each training example on the decision boundary, while the degree parameter controls the degree of the polynomial kernel. The penalty parameter \(C\) balances the tradeoff between margin maximization and classification error minimization. To achieve optimal SVM performance, it is crucial to select an appropriate kernel function and adjust its associated parameters. However, the selection process can be problem-dependent and may require careful experimentation and tuning, often using cross-validation techniques.

There are four types of kernel functions:

  1. 1.

    Linear kernel: \(K({{\varvec{x}}}_{\mathbf{i}}, {{\varvec{y}}}_{\mathbf{i}}) =\boldsymbol{ }{{{\varvec{x}}}_{\mathbf{i}}}^{\mathbf{T}}\boldsymbol{ }{{\varvec{y}}}_{\mathbf{i}}\),

  2. 2.

    Polynomial kernel: \(K({{\varvec{x}}}_{\mathbf{i}}, {{\varvec{y}}}_{\mathbf{i}}) = {({{\varvec{x}}}_{\mathbf{i}}\cdot {{\varvec{y}}}_{\mathbf{i}}+C)}^{\mathrm{d}}\),

  3. 3.

    Radial basis function kernel: \(K({{\varvec{x}}}_{\mathbf{i}}, {{\varvec{y}}}_{\mathbf{i}}) = \mathrm{exp}\left(\frac{{-\gamma |\boldsymbol{ }{{\varvec{x}}}_{\mathbf{i}}\cdot {{\varvec{y}}}_{\mathbf{i}}|}^{2}}{2{\sigma }^{2}}\right)\),

  4. 4.

    Sigmoid kernel: \(K\left({{\varvec{x}}}_{\mathbf{i}}, {{\varvec{y}}}_{\mathbf{i}}\right)= \mathrm{tanh}\left(-\gamma \boldsymbol{ }\left({{\varvec{x}}}_{\mathbf{i}}\cdot {{\varvec{y}}}_{\mathbf{i}}\right)+ C\right).\)

In this work, we consider polynomial and RBF kernel functions. As a result, the RBF kernel can capture more complex nonlinear relationships between the features than the polynomial kernel. Another difference is the sensitivity to the hyperparameters. The polynomial kernel is sensitive to the degree of the polynomial, which determines the complexity of the mapping. In contrast, the RBF kernel is sensitive to the gamma parameter, which determines the width of the Gaussian function used in the mapping.

Convolutional neural network

In LeCun et al. (1990), the authors introduced the concept of convolutional computational methods into neural networks to compute image features. The layer that computes features using convolution is called the convolutional layer. LeCun et al. (1998) presented a convolutional neural network (CNN) called LeNet-5, which introduced the concept of a fully connected layer. This layer acts as a layer of a multi-layer neural network and allows the CNN to perform both feature extraction and classification, which is considered a key advantage of this network architecture.

The convolutional neural network architecture, as shown in Fig. 2, has four important layers:

Fig. 2
figure 2

Architecture of a convolutional neural network

  1. 1

    Convolutional layer

The convolution layer is the basic component in convolutional neural networks. It is composed of multiple feature surfaces (feature maps), each of which is composed of many neurons. The neurons are connected by the convolution kernel to the local region of the upper feature surfaces. The convolution layer of CNN can extract different features of the input by the convolution operation. By increasing the depth of the convolution layers, more advanced features can be extracted. Representing the input image by ‘‘\(I\)’’, and the two-dimensional convolution kernel by ‘‘\(K\)’’; the convolution of the input image is

$$C\left(i,j\right)=\left(I*k\right)\left(i.j\right)=\sum_{m}\sum_{n}I\left(m,n\right)K\left(i-m,j-n\right).$$
(4)
  1. 2

    Pooling layer

The pooling layer is an important component of a convolutional neural network (CNN) that typically follows a convolutional layer. Its primary purpose is to downsample the feature map obtained from the convolutional layer, reducing its spatial dimensions while preserving the important features. Another important function of the pooling layer is to extract distinctive features that are invariant to translation, rotation, and scaling, thereby improving the CNN’s predictive capabilities. Common pooling methods include Average Pooling, Min Pooling, and Max Pooling, which involve dividing the feature map into local regions and computing a summary statistic such as the mean, minimum, or maximum of each region.

  1. 3

    Fully connected layer

A fully connected layer serves as the connection between the feature map and the final output. The feature map is flattened, reshaping every neuron in the last layer to act as input to the next layer. The flattened input is then multiplied by randomly generated weights between 0 and 1, with bias sometimes added. The resulting value of each neuron is then passed through a chosen activation function to obtain the result of each neuron. The hidden layer can be designed as needed, and the output layer is obtained from the last hidden layer represented in the activation function. Each neuron in the output layer is substituted into the activation function to predict the probability of the output (S), i.e.,

$$S=\sum_{i=1}^{n}{p}_{i}{w}_{i},$$
(5)

where \({p}_{i}\) is the input data and \({w}_{i}\) is the weight.

The CNNs are highly efficient for feature extraction in image classification applications because there is a combination of capturing specific patterns and hierarchical feature learning parameter sharing nonlinear modeling. Moreover, the training is effective and is suitable for feature extraction of images.

Singular value decomposition

Singular value decomposition (SVD) is a matrix factorization technique that has found wide use in many fields, from engineering and physics to data analysis and machine learning. In the field of image and signal processing, the SVD is often used to reduce the dimensionality of data by identifying the underlying structure and extracting its most important features. This can be particularly useful for compressing large amounts of data without significant loss of information. In image compression, the SVD is used to decompose an image into its most important singular values and vectors. The singular values represent the importance of each vector in the decomposition and can be used to discard less significant information while retaining the essential features of the image. This can lead to a significant reduction in the number of pixels required to represent the image, without significant loss of image quality. This approach involves a rectangular matrix with dimensions m by n, which is decomposed into three-valued products of other matrices. The classification model uses the singular values from the SVD process as inputs for both training and testing of the model. The decomposition takes the form

$$\mathbf{A}=\mathbf{U}{\varvec{\Sigma}}{\mathbf{V}}^{\mathrm{T}},$$
(6)

where \(\mathbf{A}\) is the \(m\times n\) data matrix that must be divided into sections, \(\mathbf{U}\) is an orthonormal matrix of the size \(m\times m\), \({\varvec{\Sigma}}\) is a diagonal matrix of the size \(m\times n\) and \({\mathbf{V}}^{\mathrm{T}}\) is the transpose of the orthonormal matrix of the size \(n\times n\).

By using the SVD, we can break down the large image matrix into a smaller set of matrices and remove the smaller singular values to reduce memory usage. This approach can greatly benefit the efficiency of computations and the storage requirements of these applications.

VHF radar data and images

In order to investigate plasma bubbles near the magnetic equator region, the National Institute of Information and Communications Technology (NICT) and King Mongkut's Institute of Technology Ladkrabang (KMITL) collaborated to install the Prachomklao VHF radar station at the KMITL Chumphon campus (Geographic: 10.72° N, 99.73° E and Geomagnetic: 1.33° N) in Thailand on January 17, 2020. This radar station was set up to monitor plasma bubbles that freshly form in the ionosphere. As depicted in Fig. 3, the VHF radar station includes three-element 18 yagi antennas that are arranged from east to west with a distance of approximately 5 m between each antenna. The radar system transmits VHF frequencies (39.65 MHz) through the atmosphere. The essential parameters of the VHF radar system are described in Table 1.

Fig. 3
figure 3

VHF radar system for monitoring plasma bubbles at Chumphon Station, Thailand

Table 1 Specification of the VHF radar of Chumphon station, Thailand

Two types of radar images are available: quick look (QL) images and range–altitude–time intensity (RATI).

When the radar signal intensity (or power) is plotted against time and altitude, a quick look plot image is generated, which can be used to study plasma bubbles. The distance between the radar antenna and the reflecting structure is commonly referred to as the range, but technically, it indicates the altitude. The time axis represents the time elapsed since the radar signal was transmitted before it was received by the receiving device. The quick look plot image provides a visual representation of the data collected by the radar in real-time. The collected data are plotted in a quick look (QL) plot. Figure 4a–c shows examples of three types of radar images: non-plasma bubble, unsure and plasma bubble.

Fig. 4
figure 4

Samples of quick look (QL) plot images collected by VHF radar with the x-axis represent Doppler frequency (Hz) and the y-axis represent range (kilometers) (a) without plasma bubble (b) with unsure structure [cannot be identified as EPB or due to system errors] (c) with plasma bubbles

The unsure images in Fig. 4b are those that contain objects which cannot be identified as EPB or due to system errors. To prepare images for the SVM model training, the first step is to crop the image to remove all areas where the data repeat over time. The resulting image is an RGB image with dimensions of 360 × 360 × 3 pixels. In this study, a total of 1000 images are used, with 700 images utilized as training datasets and 300 images as testing datasets. The data classes used in the model testing are non-plasma bubbles, unsure, and plasma bubbles. As for the image data used for each class, they are divided into 350 images for the class without plasma bubbles, 350 images for the unsure class, and 300 images for the class with plasma bubbles.

Proposed methods

In the proposed methods, we consider the SVM model for EPB classification using VHF radar image data. In order to reduce the input size and improve the classification performances, two feature extraction techniques: the SVD and CNN are proposed to use before the SVM. This section describes in detail the design of each method of the model used in this research.

Support vector machine model

In the research, the SVM model with the RBF and polynomial kernels are designed as illustrated in Fig. 5. For each kernel, two main parameters: \(C\) and gamma need to be defined to achieve high performance. The parameter \(C\) in SVM is a hyperparameter that governs the tradeoff between maximizing the margin (that is, the distance between the decision border and the data points that are the closest to it) and reducing the classification error. The gamma parameter determines the influence of each training example on the decision boundary and the margin. Gamma is used to regulate the width of the Gaussian kernel in support vector machines (SVMs) that employ a radial basis function (RBF) as their kernel. This kernel is used to compute the similarity between training instances. To compare the performance of the kernels, we set each kernel using \(C= \{\mathrm{0.01,1},\mathrm{10,100}\}\) and \(\mathrm{gamma} = \{\mathrm{0.01,1},\mathrm{10,100}\}\). The gamma value and parameter \(C\) for each kernel are optimized using the grid search. After the values of \(C\) and gamma are achieved, we modify the SVM system by adding a feature extracting technique: the CNN and SVD.

Fig. 5
figure 5

The architecture of the SVM model

Proposed combined singular value decomposition and support vector machine (SVD-SVM)

The SVD can be used with SVM to improve the performance of classification tasks by reducing the dimensionality of the input data while retaining the most important features. The architecture of the proposed SVD-SVM model is presented in Fig. 6. In the experiment, we employ four components with singular values with various numbers determining by a discrete approach. In both training and testing processes for the proposed model, we use the SVD technique to extract features from images before sending to the SVM system for classification of the presence or absence of EPB in the images. The SVD feature extraction converts the image into a matrix by dividing the image into a color matrix into a grayscale image by averaging the RGB channel. The grayscale image is a matrix where each element represents a pixel intensity value. The size of the matrix will be the height and width of the input image. In feature extracting process, the SVD process input image matrix to generate its principal components (singular values) and vectors of its outputs (singular vectors). The principal components selectively store only the essential parts of the singular values for use in further processes. In general, higher values are collected as input data for the next classification stage without significant loss, they represent more important features from the image. After feature extraction from the SVD is completed, the acquired features are used in the SVM to classify or recognize that object in both training and testing processes. In the model, we consider the size of singular values, \(N = \{5, 100, 200, 360\}\) for SVD and we set each kernels using \(C = \{0.01, 1, 10, 100\}\) for SVM.

Fig. 6
figure 6

The architecture of the proposed SVD-SVM model

Proposed combined convolutional neural network and support vector machine (CNN-SVM)

The combination of SVMs and CNNs is a powerful technique that leverages the strengths of both models. The CNN is used to extract the distinctive features of the data, where the number of convolution layers, the size of the filter, and the activation function are determined. In this research, we experiment with three filter sizes—\(3\times 3\), \(5\times 5\), and \(7\times 7\), and use the ReLU activation function. Additionally, max pooling is used in the pooling layer to further reduce the dimensionality of the features extracted by the CNN. The output of the CNN is then passed to the SVM model, where the model is trained to classify the images. Two kernels are used in this research—the radial basis function (RBF) kernel and the polynomial kernel. The parameters \(C\) and \(gamma\) are then set for each kernel, with the degree of the polynomial kernel also being determined. This ensures that the SVM model is optimized to achieve the best possible performance. Using CNNs for features extraction helps to reduce the amount of training data required, while also improving the overall performance of the SVM model. This is because the CNN can identify the most important features of the images, which are then used by the SVM to classify the images more accurately. To further optimize the performance of the model, a grid search is performed to identify the best values of \(C\) and gamma for each kernel, which can lead to significant improvements in the accuracy of the model. In this study, we use a filter size of \(3\times 3\). To preserve the essential image features, we employ max pooling to reduce the size of the image from the convolution layer. We set the window stride for the filter to \(1\times 1\) across all convolution layers. Secondly, we train the data from the last convolution layer using SVM. The CNN and SVM network architectures are presented in Fig. 7.

Fig. 7
figure 7

The architecture of the proposed CNN-SVM model

In this paper, the prediction accuracy of the proposed models is computed using the following equation:

$$\mathrm{Accuracy}= \frac{\mathrm{TP}+\mathrm{TN}}{\mathrm{TP}+\mathrm{TN}+\mathrm{FP}+\mathrm{FN}}.$$
(7)

Receiver operating characteristic (ROC) illustrates the proportional relationship between correct and incorrect predictions. This is an additional measure of the efficacy of the predictive model. The equation for the graph where the vertical axis is the true positive rate (TPR) or recall value and the horizontal axis is the false positive rate (FPR) value is as follows:

$$\mathrm{TPR}= \frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FN}},$$
(8)
$$\mathrm{FPR}= \frac{\mathrm{FP}}{\mathrm{FP}+\mathrm{TN}}.$$
(9)

Here, there terms are defined as follows:

  1. 1.

    True negative (TN) represents the situation in which a prediction is negative, and the actual outcome is also negative.

  2. 2.

    False negative (FN) represents the situation in which the prediction is negative, and the actual outcome is positive.

  3. 3.

    False positive (FP) represents the situation in which the predicted is positive, and the actual outcome is negative.

  4. 4.

    True positive (TP) represents the situation in which the prediction is positive and the actual outcome is positive.

The values of TN, TP, FN, and FP are calculated based on the actual outcome and the prediction as illustrated in the confusion matrix in Fig. 8.

Fig. 8
figure 8

An example of a confusion matrix table

Results and discussion

Based on data from the Prachomklao VHF radar station at the KMITL Chumphon campus, Thailand, we select the observation images from October 1, 2020, to October 15, 2020. Out of 1000 images, 700 images are used as the training dataset, representing 70% of the total, and 300 images are used for testing the model, representing the remaining 30 percent. Three data classes are labeled: Class 1 (non-plasma bubbles), Class 2 (unsure), and Class 3 (plasma bubbles). Traditionally, the SVM is used for binary classification, however, to extend the use to 3 classes, we utilize the one-versus-all (OvA) or one-versus-rest (OvR) technique, which enables SVMs to manage multi-class classification tasks. This approach divides the multi-class problem into multiple binary classification problems, where each class is identified against a group of other classes. By training separate classifiers for each class, this technique is more efficient and enhances accuracy across multi-class scenarios. Below are the findings from each experimental method. Three types of classification models are studied including the SVM only, the SVD-SVM and the CNN-SVM models.

SVM classification model

In this model, the raw VHF images are fed directly to the SVM model to be considered as a baseline method. We use two kernels, RBF kernel and polynomial kernel, for plasma bubble classification. Table 2 shows the classification accuracies of the SVM using RBF kernel with various sets of \(C\) and gamma values. The maximum accuracy of 86.67 percent is achieved when the parameter \(C\) is set to 10 and 100 and the gamma equals 0.01. It can be seen that the accuracy performances of the models using \(C\) = 10 and \(C\) = 100 are the same, but the processing time of the model with \(C\) = 10 was faster than that using \(C\) = 100.

Table 2 The accuracy of SVM system with RBF kernel with various gamma values

The degree of the polynomial kernel plays a crucial role in determining the degree of the polynomial function utilized for mapping the input features to a higher-dimensional feature space. In this study, we set the degree to \(d = \{\mathrm{0.01,1},\mathrm{10,100}\}\) and utilize the same set of \(C\) parameters for the polynomial kernel as presented in Table 3. From the experimental results, when setting the \(C\) parameter with the degree value, it shows that when the \(C\) parameter is 0.01 and the degree is 1, the model has the highest accuracy of 82.33 percent because lower \(\mathrm{degree}\) values create easier decision boundaries and less prone to overfitting.

Table 3 The accuracy of SVM system with polynomial kernel at various degrees

SVD-SVM classification model

Singular value decomposition (SVD) with support vector machines (SVMs) can be especially advantageous when dealing with high-dimensional data that possess a large number of features. In such scenarios, traditional SVMs may encounter issues of overfitting and poor generalization performance. However, by utilizing SVD to reduce the dimensionality of the data and extract the most significant features. In Fig. 9, we can see that as more singular values are included in the image matrix, the clarity of the image improves. The original image has approximately 360 non-zero singular values, but we are able to see a close resemblance to the original image using only 200 singular values.

Fig. 9
figure 9

The image quality with different number of components

The experimental results for separating different image components are presented in Table 4. In this study, we also use two kernels: the RBF kernel and the Polynomial kernel. The RBF kernel is assigned the values of \(C\) and gamma parameters as 10 and 0.1, respectively, while the Polynomial kernel uses the degree and \(C\) parameter values of 1 and 0.01, respectively. The results are presented in Table 4 where N represents the number of singular values used for classification.

Table 4 The accuracy of the SVD-SVM system with each kernel

Based on the results in Table 4, the processing time required for the models varies depending on the image decomposition techniques used, while the improvement in accuracy is not significant after the size of components (N) is increased more. The model using RBF kernel with N = 360 achieves highest accuracy, as we can see in the Table 4. The SVD-SVM model is capable of adjusting the number of components used for processing by utilizing the singular values obtained from SVD.

CNN-SVM classification model

By combining the unique CNN method to extract image features and training with SVM, we achieve higher accuracy and improve categorization of plasma bubbles. Our findings indicate that models based on the CNN-SVM combination outperform those based solely on the SVM technique.

In this work, we investigate the impact of filter sizes (\(3\times 3\), \(5\times 5\), and \(7\times 7\)) and the number of feature extraction layers on the performances of the combined SVM-CNN model. We employe a \(2\times 2\) dimension for the pooling layer, and the SVM is trained on the final layer of image features. The RBF and polynomial kernels are utilized in the SVM model. A convolution layer consists of one to four layers, with a filter stride size of \(1\times 1\). Finally, a \(2\times 2\) max pooling layer is used to fine-tune the properties of the extracted image features.

Tables 5 and 6 present the experimental results of the CNN-SVM model using the RBF and polynomial kernels, with different filter sizes and number of convolution layers. Based on the results, when the filter size was set to \(3\times 3\) and the number of convolution layers was set to 7, and the model using the RBF kernel achieves the highest accuracy of 93.67%. Similarly, for the polynomial kernel model with the same filter size and number of the convolution layers, the results show that the model achieves the highest accuracy of 92.33%.

Table 5 The accuracy of CNN-SVM system (RBF kernel) with various filter sizes
Table 6 The accuracy of CNN-SVM system (polynomial kernel) with various filter sizes

While Tables 6 and 7 present results based on a maximum of 7 convolution layers in CNN model, we explore models with more layers. However, we observe only a marginal improvement in accuracy at the expense of significantly increased latency. Specifically, when the number of layers is increased to 9, the accuracy of the model improved by approximately 0.02 to 0.05 percent. Therefore, we determine to use 7 convolution layers for our models.

Table 7 Accuracy of each model (after testing) with different kernels

Performance comparison of the proposed EPB classification models

Finally, we analyze the performances of all three models as shown in Table 7. According to the table, the CNN-SVM model provides the highest accuracy of 93.67% when using the RBF kernel. The accuracies of all model with different kernels are compared in Fig. 10. The parameters of each kernel are similar to those used in the previous section.

Fig. 10
figure 10

Performance comparison of the proposed models using different kernels

Figures 11, 12, 13 display the confusion matrix results obtained from each proposed model. The classes are referred to as follows: class 0 (non-plasma bubbles), class 1 (unsure), and class 2 (plasma bubble). The CNN-SVM model yields a higher accuracy at 28.67% compared to the SVM model and the SVD-SVM model, respectively.

Fig. 11
figure 11

Confusion matrix of the SVM model (RBF kernel)

Fig. 12
figure 12

Confusion matrix of the SVD-SVM model (RBF kernel)

Fig. 13
figure 13

Confusion matrix of the CNN-SVM

Figures 14, 15, 16 display the receiver operating characteristic (ROC) curves for each model, each point on the graph is a pair of true positive rate (TPR) and false positive rate (FPR) values for a specific threshold point. Therefore, the area under the ROC curve (AUC-ROC) represents the differences in plasma bubble classification. The results indicate that the CNN-SVM model outperforms the SVM model in terms of AUC-ROC for all three classes. Specifically, the CNN-SVM model achieved a 1% increase in AUC-ROC for class 0, a 14% increase for class 1, and a 12% increase for class 2, compared to the SVM model. AUC-ROC is a commonly used metric for evaluating binary classification models, while SVM and CNN are two popular machine learning models for classification tasks.

Fig. 14
figure 14

ROC curve for the SVM model (RBF kernel)

Fig.15
figure 15

ROC curve for the SVD-SVM model (RBF kernel)

Fig. 16
figure 16

ROC curve for CNN-SVM model (RBF kernel)

Conclusions

In this work, we propose two classification models using machine learning techniques to identify the presence or absence of EPBs in quick look (QL) plot images from the Chumphon VHF radar station. The models are developed with two separate approaches: a combined CNN and SVM classification technique, and a combined SVD and SVM techniques. By using SVD and CNN, the models effectively extract import features, resulting to a reduction in input data size for SVM. This reduction in data size not only accelerates the training and testing time of the model, but also maintaines a high level of accuracy in detecting the presence of plasma bubble in VHF radar’s QL plot images. Our CNN models, incorporating 7 convolutional layers, and SVD with 360 singular values, demonstrated substantial performance improvements. The SVM alone model is also considered as a baseline method. The experimental results show the outperformance of the CNN-SVM models over the other two approaches. Specifically, the combined CNN-SVM model, using the RBF kernel, achieves the highest accuracy of 93.08%, while the model using the polynomial kernel achieved an accuracy of 92.14%. On the other hand, the combined SVD-SVM models yield the accuracies of 88.37% while requiring less processing time. Therefore, the proposed models can be used for real-time detection and classification of EPBs, which is crucial for the space weather community. In future works, we will explore the use of other AI techniques for EPB characterization.

Availability of data and materials

The research materials and VHF radar data are mainly supported by the National Institute of Information and Communications Technology (NICT), Japan.

Abbreviations

AI:

Artificial intelligence

CPN:

Chumphon

EPB:

Equatorial plasma bubble

SVD:

Singular value decomposition

SVM:

Support vector machine

CNN:

Convolutional neural network

NICT:

National Institute of Information and Communications Technology

KMITL:

King Mongkut’s Institute of Technology Ladkrabang

QL:

Quick look

RBF:

Radial basis function

ROC:

Receiver operating characteristic

TPR:

True positive rate

FPR:

False positive rate

References

Download references

Acknowledgements

The ASEAN IVO (http://www.nict.go.jp/en/asean_ivo/index.html) project, Precise positioning and Artificial Intelligence (AI) for Ionospheric Disturbances in Low-Latitude Region in ASEAN, was involved in the production of the contents of this publication and financially supported by NICT (http://www.nict.go.jp/en/index.html). We thank the National Institute of Information and Communications Technology (NICT), Japan for providing the VHF radar data.

Funding

This work was financially supported by King Mongkut’s Institute of Technology Ladkrabang [2566-02-01-037] and the NSRF via the Program Management Unit for the Human Resources and Institutional Development, Research and Innovation (Grant no. B05F640197 and B39G660029).

Author information

Authors and Affiliations

Authors

Contributions

Data collection and experimental designs were implemented by TH. The first draft of manuscript was written by TH and LM. Completing the manuscript modifications by PS. KH and MN reviewed the manuscript. PS provided the research funding support to this work. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Pornchai Supnithi.

Ethics declarations

Competing interests

Authors declared that there are no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Thanakulketsarat, T., Supnithi, P., Myint, L.M.M. et al. Classification of the equatorial plasma bubbles using convolutional neural network and support vector machine techniques. Earth Planets Space 75, 161 (2023). https://doi.org/10.1186/s40623-023-01903-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s40623-023-01903-7

Keywords