### Entire day spectrograms as basis vectors

Following the approach introduced by Bezděková et al. (2021), the principal component basis vectors were plotted and manually checked to reveal their characteristic intensity patterns. First three principal basis vectors are shown in Fig. 1. They cover almost 60 % of the original information. Note that while the results are plotted as a function of universal time (UT), local time (LT) at Kannuslehto is larger by about 1.5 h. Moreover, magnetic local time (MLT) at Kannuslehto is larger by about 2.5 h with respect to UT. It is directly seen that the first principal component expresses mainly VLF wave intensity measured during the night, while the second principal component rather describes the dayside intensity, corresponding to about 10 % of the original information. The first principal component corresponds well to the average intensity profile obtained for this data set (not shown). Thus, as expected, the first principal component reveals main intensity profile features. These are mainly given by nightside measurements, where strong lightning generated whistlers occur. The first principal component corresponds to more than 40 % of the original information. The third principal component seems to express the intensity variations on the dawn side rather on the dusk side. Its physical interpretation needs to be explored more carefully (see below). It covers about 6 % of the original information. We note that the sudden intensity variations observed in the frequency spectra at about 1.5 kHz are likely related to the first cutoff frequency of the Earth–ionosphere waveguide (Budden 1961). While the wave above the cutoff frequency can propagate considerable distances in the waveguide, the waves at lower frequencies are essentially detectable only close to the ionospheric exit point.

To better interpret the first two principal components, which carry most of the original information, it is useful to draw a scatter plot showing their mutual dependence and to investigate how a change of individual PC coefficients is related to frequency–time spectrogram features. This is done in Fig. 2.

Figure 2a shows a scatter of individual PC1 and PC2 values for each frequency–time spectrogram from the original data set. The points are color-coded according to individual measuring campaigns. The same color coding is used in all further plots. Blue points correspond to the campaign 2016/2017, brown to measurements obtained during campaign 2017/2018, green indicate measurements during 2018/2019 campaign, and red points were measured within 2019/2020 campaign. It is seen that the points corresponding to different campaigns are distributed over the whole range of obtained coefficient values equivalently. There is thus no “extraordinary” campaign with some preferred interval of PC1 or PC2 values. From this point of view, the individual campaigns are equal and they can be compared between each other, allowing us to assume that possible differences between them are of physical origin, not given by the data processing.

Evolution of the wave intensity given by the change of PC1 and PC2 is shown in Fig. 2b–e. The coordinates (in terms of PC1 and PC2) chosen for these figures are drawn in Fig. 2a by orange crosses along with the letter corresponding to an appropriate plot panel. It can be seen that positive values of PC1 coefficients correspond to a significant increase of the nighttime wave intensity (about 0–5 UT and 15–24 UT), while positive values of PC2 correspond to an increase of the daytime wave intensity (about 5–15 UT). The most intense spectrogram is hence obtained for large positive values of both PC1 and PC2 (Fig. 2c). This supports the idea suggested already by the principal component profiles shown in Fig. 1 that while the first principal component corresponds to the nighttime VLF measurements, the second principal component describes rather the daytime VLF measurements. Note again that the physical information related to the third principal component is more tricky and it will be discussed more in detail further.

After getting an idea about the possible physical interpretation of at least the first two principal components, we aim to further investigate how individual PC coefficients vary during the season of the year. Since it is obvious that in the ground-based measurements the seasonal dependence has a significant effect, it has to be somehow reflected by the principal components. Fig. 3a–c shows the mean values of PC1, PC2, and PC3 coefficients as a function of the months of campaigns. In addition, monthly average values of Kp index are shown in Fig. 3d, giving an idea about the variations of an overall geomagnetic activity during these months. The dependences are shown for each campaign separately and they are distinguished by different colors, following the color coding introduced along with Fig. 2a.

Figure 3a, b shows that PC1 and PC2 coefficients evolve in a completely different way. While the PC1 coefficients reach the highest values during autumn and spring months, the largest values of PC2 are reached in November or December, i.e., at months corresponding to or very close to the winter solstice. However, maximal values of both PC1 and PC2 in individual months are typically reached either for the 2018/2019 or 2019/2020 campaign. The trends obtained for both coefficients are in no way comparable with the Kp index variations shown in Fig. 3d.

Seasonal dependence of the PC3 coefficients shown in Fig. 3c is quite different in comparison to the previous PC coefficients. There is no pronounced maximum or minimum as in the previous cases and the maximal average PC3 coefficients in individual months are mainly reached for the 2016/2017 campaign. From this point of view, the seasonal variations of the PC3 coefficients agree more with the Kp index dependence than for the other two PC coefficients. Although a direct correlation between the average PC3 coefficients and Kp indices is only approximate, the results shown in Fig. 3 indicate that if any principal component (out of the first three) could be related to the overall geomagnetic activity (in terms of Kp index), it is PC3.

The time scales at which individual PC coefficients evolve are analyzed in Fig. 4. It shows autocorrelation functions of the first three principal component coefficients for time lags from 1 to 70 days for individual Kannuslehto campaigns separately. Only the days of year when the data from all the four campaigns are available are used for this analysis.

The variations of autocorrelation functions obtained for PC1, PC2, and PC3 are significantly different. In the case of PC1 coefficients (Fig. 4a), the autocorrelations turn to be negative after around 30 days. In the case of the 2016/2017 campaign, it happens already after about 20 days. After becoming negative, the sign of correlation coefficients remains more or less the same for the rest of the investigated shift interval. The most significant change occurs for the 2019/2020 campaign, where the difference between the positive (for short time lags) and negative (for long time lags) correlation coefficients is the largest. Autocorrelations obtained for the PC2 coefficients shown in Fig. 4b remain positive for basically the entire time lag interval, except for the 2017/2018 campaign values which turn to be negative after about 55 days, and they gradually decrease with increasing time lags. Autocorrelations of the 2016/2017 campaign are also negative at about 60 days time lag, but since they further reach positive values again this seems to be rather a random effect.

A completely different picture of autocorrelation functions is obtained for PC3 as shown in Fig. 4c. Autocorrelations obtained for the 2018/2019 campaign decrease only slowly towards zero and they remain positive for the whole analyzed interval of the time lags. Autocorrelation values obtained for other campaigns are lower and tend to fluctuate around zero. A similar behavior of the autocorrelation function is obtained for the Kp index (not shown).

Although the proper physical interpretation of PC3 has not been done yet, the previous results indicate that it could carry, at least partially, information about wave intensity variations related to the geomagnetic activity. To confirm this hypothesis, it is necessary to find other relevant parameters which also provide information about or are affected by the geomagnetic activity. This is investigated further in Fig. 5, which shows the dependence of PC3 on the Kp index (Fig. 5a), AE index (Fig. 5b), and standard deviation of the magnetic field magnitude measured by the Sodankylä magnetometer (Fig. 5c).

It is clearly seen that all three dependences exhibit basically the same behavior—the PC3 coefficients gradually increase with given parameters. Given that all the three parameters are somehow connected with the geomagnetic activity, it is indeed reasonable to conclude that the PC3 coefficients increase along with geomagnetic activity.

When analyzing global effects which could influence the VLF wave intensity at Kannuslehto, we already mentioned the geomagnetic activity, predominantly described by Kp index. In this regard, it is important to note that the four analyzed campaigns took place during different phases of the solar cycle. The evolutions of both Kp index and sunspot number during the years of the investigated Kannuslehto campaigns are shown in Fig. 6. The intervals of campaigns which were used in the previous plots are drawn by the corresponding colors as introduced above. To better visualize the evolution of the parameters during the individual campaigns, mean values of the parameters over the campaign intervals are drawn by horizontal lines.

While the Kp indices during the first two campaigns (2016/2017, 2017/2018) were quite similar, their values increased for the latter two campaigns (2018/2019, 2019/2020). Similarly to the first two campaigns, in terms of the mean values the geomagnetic activity during these two campaigns was comparable. Fig. 6b shows that the highest solar activity occurred during the 2016/2017 campaign, then it significantly dropped, and it eventually reached the minimum during the 2019/2020 campaign (solar minimum was observed in December 2019).

### Frequency spectra as basis vectors

As discussed above, to better characterize the wave intensity evolution on shorter time scales, PCA of individual frequency spectra with the time resolution of 1 min as basis vectors is used. First three principal components obtained are shown in Fig. 7. In this case, the physical interpretation of the principal components is more complicated. For now, let us only describe the profiles of the first three principal components depicted in Fig. 7. These three principal components carry almost 95 % of the original information. Most of the information is included in the first principal component (Fig. 7a), which carries about 81 % of the information. This component is almost constant at higher frequencies, but in the frequency range up to 2 kHz, where it is significantly lower, it decreases and drops close to zero at about 1.5 kHz. This is due to the fact that in the frequency range around 1.5 kHz the wave power is usually substantially higher than anywhere else, but for arbitrary Kannuslehto spectrograms it remains more or less same. Fig. 7b shows the second principal component. It can be seen that at higher frequencies (above about 6 kHz) its sign turns to be negative. The second principal component carries about 9 % of the original information and it contributes significantly to the wave power in the frequency range between about 2 and 5 kHz where it is significantly increased. The third principal component shown in Fig. 7c reaches negative values in the frequency range between about 2 and 8 kHz and it also substantially increases at frequencies around 2 kHz. Out of the three components shown, it reaches the largest values of the wave power and it carries about 4% of the original information.

To better understand the physical meaning of the obtained principal components, it is again useful to draw a scatter plot and check how the frequency spectra vary with respect to the given PC coefficients. A scatter plot of PC1 and PC2 is depicted in Fig. 8 along with four reconstructed spectra corresponding to selected combinations of PC1 and PC2 coefficient values.

Due to the high number of the original frequency spectra (1,247,580), the scatter plot in Fig. 8a is depicted using a slightly different format to make the plot more comprehensible. It shows a number of individual original vectors associated to PC coefficients in given PC1–PC2 bins. The width of each bin is set to 10 in both dimensions. It is seen that the PC1, PC2 distribution is centered around zero, but it is not symmetric. Moreover, it seems that most of the frequency spectra are associated with negative or small positive PC2. This means that the increase of wave power observed for the second principal component (Fig. 7b) in the frequency range between about 2 and 5 kHz is not usual and the wave power in this range is typically rather decreased. A visual inspection of an arbitrary Kannuslehto frequency–time spectrogram confirms this interpretation. However, it remains unclear what this principal component in fact describes and if this can be indeed considered as a general feature of the original data set. The distribution of PC1 is roughly symmetric around zero, suggesting that the contribution of PC1 to the wave intensity can be both positive and negative. Considering that PC1 is almost constant, this is not a surprising result.

The effect of PC1 and PC2 coefficient values on the frequency spectra is seen in Fig. 8b–e. The values of PC1 and PC2 coefficients are chosen to correspond to extreme values. These are marked in the scatter plot (Fig. 8a) by the green crosses along with a letter of a corresponding panel in Fig. 8. It is worth mentioning that an arbitrary combination of PC1 and PC2 leads to a maximum value of the wave power in the frequency range up to about 1 kHz. Only the concrete patterns of these maxima vary. The increase in the frequency range between about 2 and 5 kHz observed in the spectrum of the second principal component is pronounced in the wave intensity only if both PC1 and PC2 are positive. In other cases, PC1 makes this increase basically negligible. It is further seen that a positive PC1 coefficient makes the decrease of the wave power at about 1.5 kHz more obvious (Fig. 8c, e). Furthermore, the wave power at larger frequencies (above about 4 kHz) tends to be anticorrelated with the PC2 coefficients.

Figure 9 aims to identify possible controlling factors for the first three principal components. It shows dependences of PC1, PC2, and PC3 on month of the campaign (Fig. 9a–c) and on UT (Fig. 9d–f). While the dependences on month are drawn for each Kannuslehto campaign separately as the dependences for individual campaigns noticeably vary, the dependences on UT are drawn averaged over the campaigns, because they turned out to be almost identical for all campaigns.

As Fig. 9a–c show, PC1 and PC3 exhibit similar seasonal variations, while PC2 exhibits quite an opposite trend. PC1 and PC3 are increased during autumn and spring months, while their values are minimal during winter months. Minimal values of PC1 for individual campaigns are reached either in December or February and PC3 values are minimal either in December or January. On the contrary, the PC2 coefficients peak either in December or January and they are minimal for most of the campaigns in September. The PC coefficient dependences on UT obtained for the individual components are rather different. Notice that the local time at Kannuslehto is shifted with respect to the UT by about 1.5 h, i.e., the local noon corresponds to about 10:30 UT. The PC1 coefficient dependence exhibits two global extremes—minimum between about 9 and 11 UT and maximum between 20 and 21 UT. Considering the time shift, the global minimum obtained for the PC1 coefficients corresponds well to the Kannuslehto noon. Moreover, the obtained extremes are quite symmetric as their absolute values are almost identical. The dependences obtained for PC2 and PC3 are different. The PC2 coefficient values are typically rather positive or slightly negative during night and morning hours, and they become negative after 11 UT, reaching the minimum at about 15 UT. After 17 UT they turn to be positive again. Maximum values are reached between 9 and 10 UT and between 18 and 20 UT. Positive values of PC3 (Fig. 9c) are reached between 4 and 15 UT, peaking at about 8 and 11 UT, while minimal values occur between 17 and 18 UT.

Exploiting the fine time resolution of the original data set, it is possible to investigate how the PC coefficients are affected by a substorm occurrence. A substorm list used in the present study was provided by the SuperMAG network (Gjerloev 2012; Newell and Gjerloev 2011a, b). Results of this analysis are shown in Fig. 10. It shows the average time dependence of the first three PC coefficients in the case when no substorms occurred between 6 h before and 6 h after the time of the measurement (black curves) and in the case when at least 16 substorms were detected in the given time interval (red curves). It is seen that the trend obtained for the PC1 coefficients (Fig. 10a) is very similar in both cases. Apart from the high increase of PC1 for the large number of substorms between 2 and 6 UT, the PC1 coefficients during large substorm numbers are rather lower than in no substorm situation. Remark that the profile obtained in Fig. 10a is very similar to the overall UT dependence of PC1 shown in Fig. 9d. The substorm number thus does not significantly affect PC1.

This conclusion essentially holds also for the PC2 coefficients depicted in Fig. 10b. Again, the PC2 coefficients turn to be rather lower at the times of a significant substorms activity than in the case of no substorms. An exception is again the fine interval between about 5 and 9 UT where the PC2 values for large substorm numbers increase more than for no substorms. Similarly to PC1, especially variations of PC2 for no substorms correspond to the overall PC2 dependence on UT shown in Fig. 9e as a set of no substorms covers more than 36% of the original data set.

The situation for the PC3 coefficients is significantly different (Fig. 10c). The PC3 coefficients obtained at the times of a substantial number of substorms are mostly higher than the PC3 coefficients at the times of no substorms. Moreover, while in the case of no substorms the PC3 coefficients tend to be rather negative, the maximal average PC3 coefficients for a large amount of substorms is almost 80. These maximal average values of the PC3 coefficients are reached between 6 and 7 UT and from 11 to 12 UT.