New application of wavelets in magnetotelluric data processing: reducing impedance bias

Magnetotelluric (MT) data consist of the sum of several types of natural sources including transient and quasiperiodic signals and noise sources (instrumental, anthropogenic) whose nature has to be taken into account in MT data processing. Most processing techniques are based on a Fourier transform of MT time series, and robust statistics at a fixed frequency are used to compute the MT response functions, but only a few take into account the nature of the sources. Moreover, to reduce the influence of noise in the inversion of the response functions, one often sets up another MT station called a remote station. However, even careful setup of this remote station cannot prevent its failure in some cases. Here, we propose the use of the continuous wavelet transform on magnetotelluric time series to reduce the influence of noise even for single site processing. We use two different types of wavelets, Cauchy and Morlet, according to the shape of observed geomagnetic events. We show that by using wavelet coefficients at clearly identified geomagnetic events, we are able to recover the unbiased response function obtained through robust remote processing algorithms. This makes it possible to process even single station sites and increase the confidence in data interpretation.


Introduction
The magnetotelluric (MT) method is based on the induction of natural electromagnetic (EM) fields in the ground. These natural fields can be categorized into two main classes. High frequency (HF) signals (with frequencies over 1 Hz) are mainly due to lightning activity, and the associated EM waves conveyed in the waveguide made of the ionosphere and the conductive earth. The other main class of EM waves is due to the interaction of the solar wind with the Earth's magnetic field, and this produces magnetohydrodynamic waves that are transmitted in the atmosphere through the ionosphere from the magnetosphere [1,2].
The MT method is thus based on the quasi-uniform source assumption whereby sources are supposed to be far from the measurement point [3,Chapter 2]. In this approximation, the horizontal electric field e = (e x , e y ) is linked to the horizontal magnetic field h = (h x , h y ) by convolution products * (in the time domain) with impulse response functions (z xx , z xy , z yx , and z yy ), which are components of the impedance tensor z: e x (t) = z xx (t) * h x (t) + z xy (t) * h y (t), e y (t) = z yx (t) * h x (t) + z yy (t) * h y (t). (1) According to properties of the Fourier transform, this gives us the following relation: where E and H are the Fourier transform of e and h, respectively, and Z ij is a 2 × 2 complex tensor. Z can be transformed into apparent resistivity ρ a and phase φ with: ρ a = |Z| 2 /(µω), φ = arctan( (Z)/ (Z)), µ = 4π10 −7 H/m being the magnetic permeability, and ω = 2πf with f the signal frequency. These quantities are then analyzed or inverted to infer geoelectrical properties of the subsurface. Because of the nature of MT time series (e.g., noise heteroscedasticity) and the complexity of the source field (transient nature and diversity of polarizations), Z computation is not straightforward. These issues have been the focus of several methods published since the 1970s. Addressing these issues is the objective of this paper.
The first important issue to address is the effect of noise in MT time series. Indeed, Z estimations can be severely down weighted by noise on the magnetic field [4]. The usual way to get rid of this bias is to set up a second MT station called a remote station [5]. This station is used in the processing while assuming that sources of noise at both stations are not correlated so that the bias is drastically reduced.
The second issue we would like to address comes from the complexity of MT time series that we mentioned before. The first MT processing techniques (e.g. [4]) were based on least-squares analysis which fails in the presence of outliers. Outliers may originate from several issues including non stationarity, the source effect of MT signals and brief overwhelming noise. Robust statistics were then applied to MT processing to accommodate these outliers [6,7]. Another approach was the use of wavelet techniques to look for MT sources or remove noise in the time series. Zhang & Paulson [8] applied the continuous wavelet transform (CWT) on high frequency data to select sferics events. They thus increased the signal-to-noise ratio in the response function determination in a straightforward way by the selection of useful signal parts. Trad and Travassos [9] used wavelets specifically to filter MT data in the time-frequency domain before processing with robust algorithms. Escalas et al. [10] used wavelet analysis to study the polarization properties of cultural noise sources in MT time series.
We took into account both the issues of noise and of the transient nature of some MT sources by applying the CWT on two types of signals, namely, geomagnetic pulsations and sferics. We demonstrate that the CWT may also be a powerful tool to reduce the bias in the computation of the impedance tensor, even in the case of single station processing.

Continuous wavelet transform
The CWT is a mathematical technique used to decompose a signal on a time-frequency representation through a special class of functions called wavelets. A wavelet can be defined as a physical event (e.g., Green function) and has to fulfill the following two conditions: it has to be localized both in the time and frequency domains and has to be a zero-mean function (the so-called admissibility criteria) [11]. The wavelet transform allows one to compute coefficients depending on two factors, the dilatation a > 0 (corresponding to frequency) and translation τ (time). Each coefficient is computed following equation (3), where s is the analyzed signal (e.g., electric or magnetic field), ψ is the mother wavelet, and * represents the complex conjugate: By using the properties of the fast Fourier transform (FFT) with the convolution, one is able to compute wavelet coefficients very quickly.

Wavelet's choice
In this work, we have chosen two different wavelets, the Cauchy and Morlet wavelets, and each is adapted to the shape of the considered geomagnetic event. Both wavelets are progressive ones with real valued Fourier transform. The Cauchy wavelet is given in the frequency domain by: where m is the wavelet's order (corresponding to the derivative order if an integer), H is the Heaviside step function, Γ is the gamma function and a is the dilatation factor. The Morlet wavelet is given in the frequency domain by: where ω 0 is the central pulsation of the wavelet, H is the Heaviside function and a is the dilatation factor. Equation (5) does not represent an admissible wavelet if ω 0 is not high enough to fulfill the zeromean condition. To make it admissible, ω 0 must be above π 2/ log(2). A high ω 0 implies a high number of oscillations in the wavelet functions.
The following sections illustrate the wavelet choice according to the observed geomagnetic events.

Pulsations
Geomagnetic pulsations are the result of the interactions between the solar wind and the magnetosphere, and these events will appear on EM time series as short oscillations (Figure 1). The pulsations can be divided into two main classes, continuous (Pc pulsations) or irregular (Pi pulsations). Continuous pulsations are quasi-periodic oscillations whose occurrences vary during the day. For example, Pc1 pulsations are predominant during day-hours (in local time) at high latitudes and during night-hours at low latitudes [1,12]. Moreover, even during periods of occurence, they often appear as pearls on magnetograms with alternating periods of high and low signal-to-noise ratios [13]. magnetic pulsation signals contain several oscillations so they are very localized in the time and frequency domains simultaneously. Thus like previous authors (e.g., [8,14] for atmospherics signals) we have chosen the Morlet wavelet, which represents the best compromise between the time and frequency resolution. Use of the Cauchy wavelet would be hazardous because of its low frequency resolution. Both Morlet and Cauchy wavelet coefficients are shown Figure 2. As illustrated, the highest coefficients corresponding to the geomagnetic pulsations are much more scattered in terms of the frequency when using then Cauchy wavelet instead of the Morlet wavelet.

Sferics
Sferics are transient events that occur in the frequency band range of 1 Hz to more than 10 kHz [15]. These signals are the most energetic part of the MT signal in these frequency bands (even if other phenomena occur such as Schumann resonances between the waveguide made by the ionosphere and Earth's surface).
These sferics are characterized by a sudden impulse in the time domain and can be divided into two main frequency bands, namely, those above 1 kHz and those below it. In the low frequency band of extremely low frequency (ELF) waves, sferics consist of a high amplitude impulse (Figure 3) called the slow tail [16], whereas in the high frequency band of very low frequency (VLF) waves, sferics are mainly shaped as oscillations. To analyze ELF waves using the CWT, we have chosen the Cauchy wavelet. Most of the previous work on MT methods with wavelets was based on the Morlet wavelet stemming from its properties of good resolution both in terms of the time and frequency [8,14]. However, the shape of the ELF wave ( Figure 4 (a)) is very impulsive and therefore wide in frequency, so it is closer to Cauchy's shape than to Morlet's one. Indeed, the ELF wave is very short in time and has very few oscillations, so we need a wavelet that has similar properties. As illustrated in Figure 4 (b,c), the highest coefficients obtained with the Morlet wavelet span a wide time length, i.e., longer than the actual length of the ELF wave. Because of their impulsive nature, sferics have a wide frequency content [16]. In the time-frequency domain, sferics appear as a series of high valued coefficients along the frequency axis (very localized in time and spread out in frequency).

Selection of wavelet coefficients
We manually picked geomagnetic events from our MT time series and selected wavelet coefficients from the time-frequency plane by using the following methodology. In a way similar to wavelet based denoising techniques, amplitudes of the wavelet coefficients are used to define the selection scheme. The MT system (1) can be solved in a univariate way for e x and e y separately. In both cases, one has to obtain good signal-to-noise ratios for both h x and h y . At each scale, for every channel s (output channel (e x or e y ) and input channel (h x , h y )): • We compute the median value α of the distribution of |W ψ [s](a, τ )| over a length T a = N t a around t 0 , which is the time position of the previ-ously picked event. N in this work is set to 30 and t a is the time step corresponding to the analyzed scale. • From this value, we define a threshold set as β times the median value α. • All coefficients below this threshold level are discarded for the magnetotelluric response function computation. • Coefficients are kept for the final computation when the threshold is reached on (e x , h x , and h y ) or (e y , h x , and h y ). This criteria is sufficient for geomagnetic pulsations (below the Hz). However, for ELF waves, we add one more criteria to fulfill, that is, selected coefficients must span a large frequency range (basically the whole frequency range of ELF waves without MT and audiomagnetotelluric (AMT) dead bands). In practice, we check the Fourier spectrum of the EM time series, which is a good indicator of dead band boundaries. Depending on the signal-to-noise ratio or the time of the day, this limit can be highly variable.
When a remote station is available, we add another criteria to discriminate noise from geomagnetic events. Indeed, the horizontal magnetic coefficients at the remote station have to fulfill the same properties. At the end of this stage, we obtain a group of wavelet coefficients for each scale and each selected geomagnetic event.

Inversion of wavelet coefficients
Zhang & Paulson [8] demonstrated that equation (2) can be written as: This system of linear equations in the time-frequency domain can be described by the following equation: where d contains the electric field wavelet coefficients (W ψ [e x ](a, τ ), W ψ [e y ](a, τ )), G contains the magnetic field coefficients (W ψ [h x ](a, τ ), W ψ [h y ](a, τ )), and m contains the MT response functions Z. The classical least-squares solution to this equation is: This solution is inevitably downward-biased [4]. One solution is to include a remote station in the processing scheme [5]. The solution then becomes: where G r contains the remote station magnetic field coefficients W ψ [r x ](a, τ ) and W ψ [r y ](a, τ ). Another way to reduce the bias is to actually make the predictor variable (here, the h field) as noise-free as possible. In this alternate procedure our goal is to use the wavelet coefficients of high signal-to-noise ratio geomagnetic events in the computation. By doing so, we will reduce the bias introduced by noise on the magnetic field in the solution (8).
To reduce the influence of noise in the electric field, we also use robust statistics instead of classical leastsquares. As explained in Chave & Jones [3, Chapter 5], robust statistics were introduced for MT because of the properties of natural-source electromagnetic data. Among these reasons, they state "finite duration of many geomagnetic or cultural events" and "marked non stationarity." Moreover, the heteroscedasticity of the noise can hide natural events along the time series. The main argument for using wavelet transform is precisely to take into account these issues by getting rid of the non-useful part of the time series (e.g., no apparent geomagnetic signal) before computing the MT response function. In practice, in many time series the non-useful part is the largest component. Consequently, Z computation is only based on parts of the time series where there is a significant induction occurring in the subsurface. Indeed, a significant part of the records cannot be used because of the noise level of current state-of-the art magnetic sensors [3,Chapter 9]. For example, in the AMT dead band, the signal does not rise above the noise level of the induction coils during daytime [15]. Robust statistics are still neces-sary, but we considerably increase the signal-to-noise ratio on all selected events.
Zhang & Paulson, [8] worked in a single station configuration and used conventional least-squares analysis to resolve equation (7). Instead of using all selected wavelet coefficients to recover the MT response functions, we estimate the response function on pairs of geomagnetic events. Doing so allows us to build a distribution for each component of the impedance tensor (e.g., Figure 5).
At each scale for each pair of geomagnetic events, the response function is estimated by using the Huber M-estimator described by Chave et al. [6] in a MT context. From the distribution of response functions at each scale estimated from all available pairs of geomagnetic events, we represent the final estimation by taking the median value (less sensitive to outliers). The confidence interval is represented by using the interquartile range (IQR) L-estimator of the distribution [17]. The IQR has a breakdown point of 50 %, which makes it robust to long tails in the distribution while still being a good measure of the dispersion or skewness in the distribution.

Real data application
Datasets We applied the previously described technique on two real datasets to illustrate the potential of using wavelet coefficients on both geomagnetic pulsations and sferics.
The first dataset consists of MT data that were acquired with a sampling frequency of 512 Hz in the northeastern part of France near the town of Rittershoffen as part of a geothermal monitoring experiment [18]. The electric measurements were made with EPF06 electrodes, and the magnetic measurements with MFS07e coils from Metronix SA. Data in this area are badly contaminated by 50/3 Hz and 50 Hz noise and their harmonics. We filtered this noise by using a notch filter with 2048 coefficients. The remote used with these measurements is located on the site of the Welschbruch geophysical station in the Vosges mountains at about 60 km away from the measurement site.
The second dataset consists of electromagnetic horizontal measurements that were acquired on the La Fournaise volcano before, during and after the 1998 eruption [19]. The data were acquired with a sampling frequency of 0.05 Hz with induction coils similar to those made by Metronix or Phoenix Geophysics [20], and P b-P bCl 2 electrodes. The remote station used here is also located on the volcano, a little less than 10 km away from the measurement station.
To assess the validity of our method, we have compared our results with the ones obtained with robust processing codes from the work of Alan D. Chave (BIRRP, Bounded Influence Remote Reference Processing, [6,21]), and Gary Egbert (EMTF, [22]). For simplicity, we consider error bars as they were calculated by the robust processing codes, even if they can be discussed in more detail [23,24,25].
The threshold factor value β was set to 1 for the sferic application and 4 for the pulsation application, but other values can be chosen depending on the signalto-noise ratio of the processed time series.

Remote processing
We compared the MT response functions obtained by using the wavelet coefficients with those from the application of other processing codes for both datasets.
In both frequency bands (Figures 6 and 7), the response functions obtained with the wavelet coeffi- cients were comparable to those obtained with both processing codes. For response functions in the frequency band covered by geomagnetic pulsations (low frequency band), wavelet inversion was also in agreement with robust processing even if the source was not as impulsive as sferics. Figure 5 also demonstrates that the use of robust statistics on the wavelet coefficients of only two events allowed for the recovery of the response functions with good accuracy.

Single site processing
For single site processing, we processed single site MT stations by removing the remote station for all processing codes. For visual comparison, only BIRRP results are shown, but conclusions remain the same for EMTF.
For high frequency results (Figure 8), ρ yx was down- ward biased by noise over a large frequency bandwidth and ρ xy was also biased around 0.08 Hz. Because of the position of the station near housing, it was difficult to assess where the noise source was (it could have been from electric fences or pipelines). In this case, the result given by robust processing in the single site configuration would lead to a wrong interpretation of the MT response. Wavelet processing allows one to drastically reduce the noise bias and recover the impedance tensor obtained by using remote processing even near the 50 Hz frequency.
For low frequency processing (Figure 9), the single site ρ xy was slightly downward biased by noise where the wavelet-based result remained within the confidence bounds of the remote processing. Particularly, the ρ xy component was very sensitive to the coherence parameter. Some examples of interpretable MT responses for this component are illustrated Figure 10. If the coherence parameter is too low (below 0.5), ρ xy is distincly biased by noise (up to a factor of three) for periods below 200 s.

Other transient applications
So far, we have illustrated only two types of transient signals, namely, sferics in their lowest frequency range (e.g., slow tails) and geomagnetic pulsations. This approach can also be applied on wider MT bands than the ones presented in this paper.
For higher frequency applications, the slow tails studied in this paper already contain frequency content above 256 Hz (up to 1 kHz). Other impulsive waves such as VLF waves have a distinctive signature in the time-frequency plane and frequency content from 1 kHz up to more than 10 kHz [26]. In Figure 11, we BIRRP with remote station data Single site results: Wavelet based results BIRRP using a 0.5 coherence threshold BIRRP using a 0.7 coherence threshold BIRRP using a 0.9 coherence threshold show an example of such a VLF wave and its timefrequency analysis. These waves were already used in an MT processing study by Zhang & Paulson [8]. The assortment of low frequency geomagnetic sources is large. Among transient events at periods lower than 200 s, there are Pc5 geomagnetic pulsations that have periods up to 500-600 s [1]. Geomagnetic storms are also low frequency sources of induction. They have already been studied with wavelet analysis (e.g., [27]). We show, for example, in Figure 12 a signal of interest at low frequency that was recorded on the Piton de La Fournaise volcano on 16 July 1997, and the signal contains significant frequency content up to more than 1000 seconds.

Conclusion
We have shown through these experiments that the continuous wavelet transform (CWT) is an easy and efficient way to characterize the magnetotelluric (MT) response function for transient geomagnetic events. Using this technique, we have shown that most of the information contained in the source wavelet coefficients is sufficient in datasets to enable the characterization of the MT impedance tensor. Two types of geomagnetic events were studied, geomagnetic pulsations and sferics, and thus this work increases the frequency band studied in previous publications involving CWT application during MT processing. The mother wavelet has to be adapted to each transient event to accurately recover the source information. Even irregular geomagnetic pulsations that are very localized in the time and frequency domains simultaneously can be used in MT processing. The other transient sources of induction are currently being investigated and will be the subject of future papers on wavelet analysis of magnetotelluric time series.
We have also shown that we were able to drastically reduce the noise bias of the considered datasets on the MT response function in the case of a single station configuration. This was achieved by using the high signal-to-noise ratio transient events in the MT time series. By using robust statistics with only two geomagnetic events in the wavelet domain, we were able to recover accurate response functions.
Yet, we still have to develop a way to automatically detect geomagnetic events to analyze their properties (e.g., polarization) and their effect on MT impedance computation.