On data reduction methods for volcanic tremor characterization: the 2012 eruption of Copahue volcano, Southern Andes

Improving the ability to detect and characterize long-duration volcanic tremor is crucial to understand the long-term dynamics and unrest of volcanic systems. We have applied data reduction methods (permutation entropy and polarization degree, among others) to characterize the seismic wave field near Copahue volcano (Southern Andes) between June 2012 and January 2013, when phreatomagmatic episodes occurred. During the selected period, a total of 52 long-duration events with energy above the background occurred. Among them, 32 were classified as volcanic tremors and the remaining as noise bursts. Characterizing each event by averaging its reduced parameters, allowed us to study the range of variability of the different events types. We found that, compared to noise burst, tremors have lower permutation entropies and higher dominant polarization degrees. This characterization is a suitable tool for detecting long-duration volcanic tremors in the ambient seismic wave field, even if the SNR is low.


Introduction
Active volcanic environments are prone to generate long-period sustained seismic signals, known as volcanic tremors. Due to their link with hydrothermal-magmatic systems (e.g., Ripepe and Gordeev 1999;Fujita 2008;Girona et al. 2019;Jolly et al. 2020), their detection and characterization is a key practice for volcano monitoring (Ogiso et al. 2015) and their analysis can, for example, provide insights into the dynamics of phreatic activity (Yukutake et al. 2017).
The most common procedure for detecting volcanic tremors consists of a visual inspection of the seismogram and spectrogram of the component showing the highest signal-to-noise ratio which, usually, is the vertical. Then, volcanic tremor is generally classified according to the spectral signature between 0.5 and 10 Hz (Konstantinou and Schlindwein 2003) as monochromatic if it shows a narrow peak, harmonic if, besides, it shows overtones, and broadband-type, if it shows multiple uncorrelated peaks. The duration of the mechanism that generates the tremor source, which can last from minutes to days or years (McNutt and Nishimura 2008), and the lack of clear onset and end, has led volcanic tremor to be interpreted as a seismic signal generated by a continuous process that can be identified in the seismograms only when it exceeds the background seismic energy level (Carniel 2010).
Another approach to detect volcanic tremor is through data reduction methods, which characterize the ambient wave field by reducing large numbers of samples without losing information about its dynamics (Carniel 2014). In these terms, any event is well-characterized if all the parameters used to describe it contain the complete information about the process involved in its generation, Open Access *Correspondence: ifmelchor@unrn.edu.ar i.e., source and propagation effects. For example, Tárraga et al. (2014) showed that time evolution of spectral-based reduced parameters is capable of identifying transitions between volcanic processes. However, when only one seismic station is available, such as in Deception Island (Jiménez- Morales et al. 2017) or Nyiragongo (Barrière et al. 2018), volcanic tremors may be confused with a non-volcanic ambient source, especially when they do not correlate with the ongoing volcanic activity. Therefore, a more complete characterization of the wave field of the tremors may overcome this issue, especially in limited monitoring systems. A suitable choice of few, information-rich parameters could also be the starting point for the definition of efficient feature vectors for Self-Organized Maps (Carniel et al. 2013) or other Machine Learning algorithms (Ren et al. 2020).
In this work, we use reduced parameters to characterize long-duration periods of sustained seismic energy above the background level near Copahue volcano from June 2012 to January 2013. Section 2 describes the Copahue and the main volcanic activity that occurred during the selected period. In Section 3, we define and justify the parameters used to characterize the ambient seismic wave field. The temporal evolution of reduced parameters are introduced in Section 4, whereas the criteria used to select and classify long-duration events are described in Section 5. Finally, in Section 6, we compare the reduced parameters of the different events to investigate if they are suitable to separate volcanic tremor from noise.

Copahue volcano during 2012
Copahue is an active basaltic-andesitic stratovolcano in the Southern Andes Volcanic Zone at the border between Argentina and Chile (Cembrano and Lara 2009), which is being monitored by the Southern Andean Volcano Observatory (OVDAS) of Chile since 2012 Morales et al. 2015). The volcanic cone is emplaced on the western margin of the Caldera del Agrio, both constituting the Caviahue-Copahue Volcanic Complex (Melnick et al. 2006) (Fig. 1). The active crater hosts a hyper-acidic lake of 300 m diameter Tassi et al. 2017). Towards the northeast, several fumaroles, hot springs and soil carbon dioxide flux define the geothermal area of Copahue (Chiodini et al. 2015). The fluids from the geothermal system are heated meteoric waters, whereas Copahue-summit waters receive fluids from a magmatic-hydrothermal hyperacid reservoir, in contact with magmatic gases (Varekamp et al. 2009;Agusto et al. 2013). Moreover, Varekamp et al. (2009) estimated a minimum depth of about 1500 m for a singlephase hydrothermal system under Copahue vent, which coincides with the location of the shallower volcanic conduit proposed by Lundgren et al. (2017).
The structural system of Copahue is controlled by the Callaqui-Copahue-Mandolgüe N60 • E-striking lineament (Melnick et al. 2006). In 2010, the M w 8.8 Maule earthquake (Fig. 1), in Chile, caused a normal static stress reduction that favored the volcanic reactivation of Copahue (Bonali 2013). Then, after a quiescence period of 12 years, a new eruptive cycle began on 17 July 2012 when several phreatic explosions occurred in the crater lake. Two days later, a phreatomagmatic eruption emitted a small plume along with pyroclastic sulfur-rich material (Caselli et al. 2016;Agusto et al. 2017). From August to November, the same authors linked a crater lake decrease of ∼ 20 m to intense bubbling and fumarolic activity. A month later, on December 22, phreatic explosions evolved to a Strombolian eruption, ejecting a 2000-m-high plume and volcanic bombs (Petrinovic et al. 2014). During the eruption, a small volume of lava reached the top of the crater evaporating the lake (Caselli et al. 2016). Despite the incandescent emissions observed in the crater during the eruption, there were no lava flows. This event was classified with a VEI of 2 (Petrinovic et al. 2014). In January, the surface activity continued with a 100-m-high white plume without phreatomagmatic eruptions (Petrinovic et al. 2014).

Data and methods
The seismic data were recorded by a three-component, short-period seismometer composed of three 1-Hz Teledyne S13 sensors and a 23-bit A/D converter sampling at 40 Hz. The station, which operated since 2010, is located 9.5 km ESE from the Copahue vent ( Fig. 1) as a part of the seismic network of National Seismic Prevention Institute (INPRES) of Argentina.
We selected the period from June 2012 to January 2013, corresponding to the beginning of the new eruptive cycle of Copahue and analyze this seismic dataset using data reduction methods. The process consisted of splitting three-component seismograms into non-overlapping segments of 20 min and computing several parameters as described below.

Averaged PSD
Spectral parameters were derived from the average power density spectrum (PSD) of the vertical-component seismograms. This average is a PSD smoothed by applying a 1-min moving average window with no overlap (for details, see Additional file 1). Besides, each PSD was calculated using the adaptive weighted multitaper spectrum algorithm, which reduces the spectral leakage compared to, for example, the periodogram (Prieto et al. 2009). Energy e in the frequency range 0.5-10 Hz was estimated by applying Parseval's theorem (Båth 1974). At the same frequency range, dominant and centroid frequency were extracted from the PSD: f d as the frequency value associated with the most energetic spectral bin, and f c as the weighted mean.

Permutation entropy
For each 20-min segment, the permutation entropy for the vertical component, filtered in the frequency range 0.5-10 Hz, was calculated. The use of this parameter to detect volcanic tremor is supported by numerous studies that associate its source mechanism to a lowdimensional nature (e.g., Konstantinou et al. 2013). The dimension of a time series is a measure of its complexity, which is generally extracted in the reconstructed phase space by computing the fractal dimension, Lyapunov exponents, or Kolmogorov-Sinai entropy, among others (see Abarbanel et al. 1993, for a review). However, the calculation of the dimensionality is computationally expensive since several fine-tuning tests are required (Konstantinou and Lin 2004). Thus, permutation entropy was introduced by Bandt and Pompe (2002) as a fast and robust complexity measure of the time series. Glynn and Konstantinou (2016) utilized permutation entropy as a tool to monitor the state of randomness in seismic noise and found that it had decreased significantly prior to the 1996 Gjalp eruption in Iceland.
Permutation entropy comes from the field of symbolic dynamics, where each number of the time series is treated as a symbol, and a sequence of symbols is called a pattern. Two parameters are involved in its computation: the length of the pattern n and the delay between neighboring equidistant symbols τ . The choice of n determines the total number n! of different patterns, whereas τ is used to take into account low-digitization data (more details in Additional file 2). Therefore, permutation entropy h is the Shannon entropy of the relative frequency of patterns occurrence:  Melnick et al. (2006) where p k is the frequency of occurrence of the k-pattern, also called permutation. Low and high h represent deterministic and stochastic behaviors, respectively (Staniek and Lehnertz 2007). Equation (1) is normalized by the logarithm of the total number of patterns to bound h between 0 and 1, which is simpler to interpret and compare.
After several tests, we found that different values of n and τ modify the magnitude of the h, but not its temporal evolution (see Additional file 3). We set n = 5 and τ = 1 , as suggested also by Glynn and Konstantinou (2016), because this combination provides high variability between stochastic and deterministic behaviors, being suitable for tracking permutation entropy trends.

Polarization degree
Finally, we compute the polarization degree of the threecomponent seismogram in the frequency range 0.5-10 Hz. This measure quantifies in the frequency domain to what extent the wave field is polarized (Samson and Olson 1980). The use of this parameter to identify volcanic tremor in continuous data is supported by several studies that describe the volcanic tremor as a signal with a frequency-dependent polarization content (e.g., Almendros et al. 1997;Acernese et al. 2004).
One of the approaches to derive polarization attributes consists in performing a singular value decomposition (SVD) of the cross-spectral density matrix S . Then, polarization degree P is computed as (Samson and Olson 1980): where Tr denotes the trace function, and W is the diagonal matrix produced by the SVD of S , whose elements σ i (f ) are proportional to the spectrum amplitude of its principal complex vectors (Jackson et al. 1991). In Eq. (2), P is defined in the range [0, 1]. When σ 1 (f ) >> σ 2 (f ), σ 3 (f ) , P(f ) ≈ 1 and the polarization vector, i.e., the singular vector associated with the main singular value, is well determined; instead, if σ 1 (f ) ≈ σ 2 (f ) >> σ 3 (f ) , P(f ) ≈ 0.5 and two separate polarization states may exist at f. When the singular values are approximately equal, i.e., P(f ) ≈ 0 , an unpolarized state occurs (Park et al. 1987).
Following the same criteria used for the averaged PSD, each cross-spectrum was computed by applying a 1-min moving average window with no overlap. Thus, (2) P(f ) = 3 Tr(W 2 (f )) − Tr 2 (W(f )) 2 Tr 2 (W(f )) , the dominant polarization degree p m (f ) was computed as the maximum value of P(f ).

Temporal evolution of reduced parameters
The temporal evolution of the computed parameters from June 2012 to January 2013, month by month, are available in Additional file 4. As an example, we describe the variability of these parameters during December 2012 in Fig. 2, when the last major Strombolian phase of Copahue occurred. Red dashed horizontal lines on each plot show the average value for the entire period analyzed. Average values of e and h are assumed to be the background levels during the 8 months.
The eruptive stage, between December 22-26 (black rectangle), was composed of three energetic pulses described by a sudden energy rise followed by a gradual drop (Fig. 2b). December was characterized by numerous high-energy periods with energy e above their background, being the first pulse of the eruption the highest energy period of ∼ 70 dB, which almost double the background level. The narrow peaks of more than 75 dB observed around the 27th are associated with strong regional tectonic events that occurred ∼200 km away. Other energetic but short-duration events, such as VT earthquakes or long-period events, are also observed as narrow peaks in the time series.
Parameters f d , and f c are represented in Fig. 2c and d, whereas Fig. 2a depicts all averaged PSDs over time, i.e., the spectrogram. f d is generally the least sensitive parameter with values centered at 1 Hz, even during the first pulse of the eruption. However, f d shows periods of high values of 4-6 Hz mostly before (on the 18th and the 21st) and during the eruptive stage. In comparison to f d , the variability of f c is centered at its average, dropping to lower values during the main pulse of the eruption.
Permutation entropy h (Fig. 2e) is less influenced by transient signals than e, as it shows a smoother variation with fewer narrow peaks. During December, h was characterized by values above the background of 0.68. This trend was interrupted during the main eruptive pulse, decreasing to ∼ 0.5 . It is interesting to note that energetic peaks may correlate both with relative h lows (e.g., on the 18th and 27th) and relative h highs (e.g., on the 14th and 16th).
The polarogram in Fig. 2f shows the evolution of the frequency-dependent degree of polarization over time, just as the spectrogram shows the average PSD. Remarkably, the polarogram depicts a sustained period of high values of polarization degree around 0.85, as can be observed in p m (Fig. 2g), between mid-18th and 26th. Figure 2 is a clear example of how reduced parameters can reveal long-duration events in the ambient seismic noise. Sustained energy above the background level is a clear sign of the occurrence of a long-duration event, as shown during the main eruptive pulse. Hence, we identify 52 long-duration events during the selected period, satisfying two conditions: (i) the average seismic energy is greater than the background (red dashed line); and (ii) the duration of the period is at least 6 h. It is reasonable to expect that, as the station is isolated from cultural noise sources, all events are either seismic episodes associated with the volcanic or geothermal activity, or bursts of incoherent noise linked to, e.g., weather conditions. Figure 3 shows the occurrence of all long-duration events in the same temporal axis. Each event is represented as a horizontal bar, whose length expresses its duration, and the vertical position its average value. The longest event occurred between the end of September and the beginning of November ( ∼ 863 h). The second longest event ( ∼ 135 h) occurred in the 2nd week of July, 5 days before the phreatic and phreatomagmatic explosions (highlighted in orange). However, these events are exceptions since the average duration of the remaining events is around 15 h. On the other hand, October, entirely covered by an event, was the most active month with a cumulative energy of 4 · 10 4 dB h. It is followed by June, July, and December with ∼ 10 4 dB h each. In contrast, August and November were relatively quiet, with values of 10 3 and 2 ·10 3 dB h.

Long-duration events
After a visual inspection of the spectrogram corresponding to the vertical-component seismogram of each event, we classified 32 events as episodes of volcanic tremor, and 20 events as noise bursts (N). Based on their spectral content, tremors were separated in broadband (BT, 17), mid-band (MT, 8), or narrow-band (NT, 7) types. BT episodes have a spectral content spread in the range of 2-6 Hz, whereas NT episodes show one or multiple narrow peaks, with a fundamental period mainly around 1 s. In between BT and NT, we find MT episodes showing a spectral content in the range 1-3 Hz. In contrast, N show energy in a wider range, without any clear dominant frequencies. Additional file 5 shows, for each of these types of events, the vertical component of the seismogram and its spectrogram.
The behavior of reduced parameters for each event can be easily recognized in Additional file 4, and is summarized in Fig. 3. Each type of event has a color, and each event has a code (see Additional file 6). This code is a combination of a number (indicating the chronological order of the event) and a letter (associated to the event type). For example, 3b refers to the 3rd broadband tremor found in the dataset.
It can be seen that episodes 1m and 4m were associated with the eruptive activity of July and December, respectively, which means that these episodes could also be eruptive tremors (McNutt and Nishimura 2008). The waveform of the 5 remaining MTs detected in September, November, and January are more similar to 1m than to 4m. This suggests that similar, weak phreatic activity may have occurred before and after 5n, the longest NT, and after the Strombolian eruption of December (highlighted in red in Fig. 3). 5n occurred during the decrease of the crater lake level. After November, no more NTs were detected.
BT is the most common tremor type recorded, with a total of 17 episodes. The first 5 occurred in June, 1 month before the phreatomagmatic eruption. Many others occurred in December 18-26, 4 days before and during the eruption. The correlation between BTs and the eruptive activity, especially in December, may suggest an association with the migration of magma to shallower depths.
N events were observed almost every month analyzed, without any correlation with volcanic activity. Rather, the comparison of their time evolution with the daily average wind speed, provided by the AIC (http://www.aic.gov.ar/

Characterization of long-duration events
We have characterized the 52 events by average values of energy e , dominant frequency f d , centroid frequency f c , permutation entropy h , and dominant polarization degree p m (see Fig. 3 and Additional file 6). The capability of this characterization to separate the different types of recorded events can be evaluated by analyzing the range of variability of the characteristics of each event type (see Additional file 8). In Figure 4, we compare the permutation entropy and dominant polarization degree with the energy and centroid frequency. We have chosen the centroid rather than the dominant frequency since it is more sensitive to changes in the seismic wave field. However, in situations with shorter source-to-sensor distance, the dominant frequency may be the most suitable parameter to analyze this variability.
On average, noise bursts have higher values of permutation entropy than tremor episodes ( Fig. 4a and b), meaning that tremors have a higher concentration of permutations than noise. In other words, tremors are deterministic signals tending to reduce the permutation entropy. Among the 32 tremors, there are 15 with a h greater or equal than the background level, and most of them were classified as BT. This suggests that a BT episode may be indistinguishable from ambient wave field based on the permutation entropy alone. However, there is a strong probability that values of h below the a b c d background are associated with an MT or NT tremor. On the other hand, an inverse relationship is observed between h and e for BT and MT tremors in Fig. 4a, suggesting that the more energetic the tremor, the lower the permutation entropy. Therefore, if the tremor is energetic enough, the separation from noise bursts becomes clearer. In Fig. 4b and d, MT and NT tremors are separated from BT and N events. The average f c value for MT and NT tremors is 1.6 Hz, whereas for BT and N events is of around 3.5 Hz, denoting that centroid frequency value can distinguish BT from MT and NT tremors, but not BT tremors from noise bursts. This occurs because most of the spectral energy of the BT and N wave field is distributed in a wider band reaching higher frequencies, while MT and NT episodes, characterized by dominant spectral content in the band 1-3 Hz, have the lowest values.
Additionally, Fig. 4b shows that, in general, event types with lower f c display lower h as well, while broadband types are characterized by higher values of h . This suggests a relationship between the permutation entropy and the spectral content. Such behavior was also evidenced in Glynn and Konstantinou (2016) when the authors linked low values of permutation entropy with the lack of energy at high frequencies. Figure 4c and d are related to the average dominant polarization degree of the events. When this value is close to one, the singular vector associated with the greatest singular value expresses the largest fraction of seismic energy in a unitary complex space (Samson 1983), which means that this direction coincides with the direction of polarization. Thus, all the events with high values of p m are characterized by well-defined elliptical and/or linear polarization states.
N events show values of p m below 0.5, meaning that noise bursts are closer to an unpolarized state. On the contrary, all tremor episodes are characterized by dominant polarization degrees above 0.6, and 78% of them are above 0.7. This means that the wave field of most tremors has a well-defined polarization. Our results are consistent with the recent observations of volcanic tremors, which showed a wave field composed mainly of P waves at regional distances (Haney et al. 2020).
The polarization degree is sensitive to path and topographic effects of the traveling waves (Neuberg and Pointer 2000). These effects (attenuation, interference, scattering, etc.) contribute to the loss of the polarized state of the wave field associated with tremor episodes. This can explain the wide range of p m observed. On the other hand, Fig. 4c shows that the degree of polarization is independent of the energy of the events, which makes the polarization degree a robust parameter to detect tremor episodes in the ambient seismic noise. The comparison of p m with f c in Fig. 4d improves the distinction of BT, MT-NT, and N, separating different groups in the graph.

Conclusions
Copahue volcano was seismically active between June 2012 and January 2013. Data reduction methods applied to continuous seismic wave field has revealed the occurrence of 52 long-duration periods (> 6 h) with sustained energy above the background level. We have characterized the 52 events by averaging values of energy, dominant and centroid frequency, permutation entropy, and dominant polarization degree, and classify them by visually inspecting the spectrogram of the vertical seismogram as 32 volcanic tremor and 20 noise bursts. Our results suggest that both permutation entropy and polarization degree can be used to identify tremors in the temporal evolution of reduced parameters (e.g., Fig. 2). The best results were obtained when we combine the information from both parameters simultaneously. Therefore, values of h < 0.7 and p m > 0.6 are a clear sign of the occurrence of volcanic tremors when the energy is above the background. Moreover, if the centroid frequency is greater than 3 Hz, the tremor is a broadband type. These ranges of values can be used in Copahue volcano to detect long-duration volcanic tremors, even for signals with low signal-to-noise ratios that may be undetected with a routine approach.
Our work opens several lines of research, such as investigating optimal parameters for computing the permutation entropy in other volcanic areas, analyzing the behavior of the ambient noise at multiple stations-which can provide information about the nature of different events-correlating reduced parameters with different volcanic activities, or exploring the occurrence of longduration tremors at different volcanic sites. Furthermore, if the volcanic activity associated with the long-duration source is well-known, the characterization proposed here can represent a step forward in the development of automatic methods for real-time monitoring systems and for analyzing large periods of continuous data.
Additional file 1. Schematic illustration of the procedure to compute average PSD. (a) An example of a 20-minute segment of data. (b) The 1-min multitaper PSDbetween 4 and 5 minutes. (c) The average PSD multitaper among the 20 1-min multitaper PSD. The number of tapers we applied was 6. The number of points for the FFT algorithm was 4096,the upper power-of-2 length of the 1-min window. Thus, the number of frequency bins between 0.5 and 10 Hz is 974.