Geomagnetic Virtual Observatories: monitoring geomagnetic secular variation with the Swarm satellites

We present geomagnetic main field and secular variation time series, at 300 equal-area distributed locations and at 490 km altitude, derived from magnetic field measurements collected by the three Swarm satellites. These Geomagnetic Virtual Observatory (GVO) series provide a convenient means to globally monitor and analyze long-term variations of the geomagnetic field from low-Earth orbit. The series are obtained by robust fits of local Cartesian potential field models to along-track and East–West sums and differences of Swarm satellite data collected within a radius of 700 km of the GVO locations during either 1-monthly or 4-monthly time windows. We describe two GVO data products: (1) ‘Observed Field’ GVO time series, where all observed sources contribute to the estimated values, without any data selection or correction, and (2) ‘Core Field’ GVO time series, where additional data selection is carried out, then de-noising schemes and epoch-by-epoch spherical harmonic analysis are applied to reduce contamination by magnetospheric and ionospheric signals. Secular variation series are provided as annual differences of the Core Field GVOs. We present examples of the resulting Swarm GVO series, assessing their quality through comparisons with ground observatories and geomagnetic field models. In benchmark comparisons with six high-quality mid-to-low latitude ground observatories we find the secular variation of the Core Field GVO field intensities, calculated using annual differences, agrees to an rms of 1.8 nT/yr and 1.2 nT/yr for the 1-monthly and 4-monthly versions, respectively. Regular sampling in space and time, and the availability of data error estimates, makes the GVO series well suited for users wishing to perform data assimilation studies of core dynamics, or to study long-period magnetospheric and ionospheric signals and their induced counterparts. The Swarm GVO time series will be regularly updated, approximately every four months, allowing ready access to the latest secular variation data from the Swarm satellites.


Introduction
The geomagnetic field undergoes gradual change, evolving year by year in a process known as geomagnetic secular variation. These changes are thought to result primarily from motions of liquid metal in the Earth's outer core but this process is not yet well enough understood to allow accurate predictions of future behavior, even a few years ahead (e.g., Alken et al. 2020a). In this situation we are forced to rely on carefully monitoring the geomagnetic field and its changes in order to provide the information necessary for navigation and orientation applications, and for descriptions of near-Earth radiation belts and current systems.
To make progress beyond field monitoring, detailed information on the geomagnetic variations as a function of space and time must be combined with knowledge of the underlying physical processes. With the rapid development of numerical geodynamo models over the past decade (e.g., Aubert and Finlay 2019), there is now the prospect of assimilating such information into realistic models, such that the processes underlying secular variation can be better understood.
For both monitoring long-term geomagnetic variations, and for data assimilation applications, it is an advantage to have processed satellite magnetic field data available on a well organized grid, with a regular sampling rate in space and time. The Geomagnetic Virtual Observatory (GVO) method is one approach to obtain such a dataset.
The Geomagnetic Virtual Observatory method was first proposed by Mandea and Olsen (2006) as a tool for making satellite magnetic field measurements easily accessible as time series of the vector geomagnetic field at pre-specified locations. The GVO method involves fitting a scalar magnetic potential to satellite magnetic field observations from a chosen time window and within a local region, defined by a cylinder centered on a GVO target point. The potential is then used to compute the magnetic field at the GVO target point such that a mean magnetic field over a chosen time window at satellite altitude is determined; see Fig. 1. The GVO time series thus mimics the time series produced by ground-based magnetic observatories on timescales of months and longer. The main advantage of the GVO time series is that they can be produced at any sites of interest that are covered by satellite data, and in particular, can provide a global grid of time series derived from measurements made by similar instruments onboard satellites such as the Swarm trio.
Applications of the GVO time series include geomagnetic jerk studies (Olsen and Mandea 2007), comparisons with spherical harmonic (SH) based geomagnetic field models (Olsen et al. 2009, core flow studies (Kloss and Finlay 2019;Rogers et al. 2019) and data assimilation studies (Barrois et al. 2018). The GVO method can also be used to derive estimates of the magnetic field gradient tensor (Hammer 2018).
Focusing on the core magnetic field, initial studies showed that the original GVO series were contaminated by ionospheric and magnetospheric sources (Beggan et al. 2009;Domingos et al. 2019;Olsen and Mandea 2007;Shore 2013). Recommendations for improving the original GVO concept and better removing such contamination have been proposed (Hammer 2018;Shore 2013). Some of these improvements were implemented in more recent GVO series that have been used for core flow studies by Barrois et al. (2018); Kloss and Finlay (2019); Rogers et al. (2019); and Whaler (2017).
Here, we present details of an updated processing GVO scheme that has been developed during a Swarm DISC (Data, Innovation and Science Cluster) project and is now being used to produce regularly updated Swarm GVO time series as an official ESA Level 2 product. The primary purpose of this paper serves as a reference describing the Swarm GVO series and presenting example validation comparisons with ground observatories. In addition to taking account of the most important recommendations from earlier GVO studies, the series presented here also take advantage of principal component analysis (PCA) (Cox et al. 2018) and spherical harmonic analysis (SHA) in an effort to better isolate the core field signal.
In the "Data" section we describe the input data from the Swarm satellite mission, and the adopted data selection strategies. In the Sect. "Methodology" we describe in detail how the GVO series are calculated. The Sect. "Results" presents examples of GVO time series, derived using Swarm measurements from December 2013 to March 2020 and describes comparisons with ground observatory magnetic field series and global field model predictions. In the Sect. "Discussion and conclusions" we reflect on what can be learned from these comparisons, describe possible applications for the GVO series and mention ideas for extending and improving the present GVO approach.

Data
This section describes the satellite magnetic field measurements used to derive the Swarm GVO time series. The GVO products take as input vector magnetic field measurements in the form of the Swarm Level 1b (L1b) product MAGX_LR_1B, which contains quality-screened, calibrated and corrected measurements given in physical SI units (nT) in a North, East, Center, hereafter NEC, reference frame. For the results presented here, we use Swarm data versions 0506 from the 1-Dec-2013 to 30-Mar-2020.
From the Swarm L1b 1Hz magnetic field data, two separate data chains are produced. Data chain (a) simply extracts all available measurements using a sub-sampling of 15s.
Data chain (b) extracts, again using a sub-sampling of 15s, only those measurements that satisfy the following dark, geomagnetic quiet-time selection criteria: • Gross measurement outliers for which the vector field components deviate more than 500 nT from the predictions of the latest CHAOS field model (here version CHAOS-7.2 (Finlay et al. 2020)) are rejected • The Sun is at least 10 • below horizon • Geomagnetic activity index K p < 3 0 • Time rate of change of Ring Current (RC) index |dRC/dt| < 3nT/hr −1 (Olsen et al. 2014) • Merging electric field at the magnetopause E m < 0.8mVm −1 , (Olsen et al. 2014) • Constraints on IMF requiring B z > 0 nT and |B y | < 10 nT.
2-hourly means are computed from 1-min values of the solar wind and IMF from the OMNI data-base (http:// omniw eb.gsfc.nasa.gov). The Swarm GVO method described in the next section makes use of sums and differences of the satellite magnetic field measurements. Denoting the input magnetic data at a given position r by d l (r) =l · B(r) , where B(r) is the vector magnetic field and l is a unit vector in the component direction, then d l and d l denote sums and differences of the vector magnetic field components, respectively. Both along-track (AT) and East-West (EW) data sums and differences are considered such that �d l = (�d AT l , �d EW l ) and �d l = (�d AT l , �d EW l ) . Along-track data differences are calculated using the 15-s differences �d AT l = [B l (r, t) − B l (r + δr, t + 15s)] , where δr = (δr, δθ, δφ) is the change in position. A 15-s along-track difference with a satellite speed of ≈ 7.7 km/s corresponds to a distance of 115 km (Olsen et al. 2015). Along-track sums are similarly calculated as �d AT l = [B l (r, t) + B l (r + δr, t + 15 s)]/2 . The East-West differences are calculated as having an East-West orbit separation between the Swarm Alpha (SWA) and Charlie (SWC) satellites of ≈ 1.4 • , corresponding to 155 km at the equator (Olsen et al. 2015). The East-West sums were calculated as (r 2 , t 2 )]/2 . For a particular orbit of SWA, the corresponding SWC data were selected to be that closest in latitude with the condition that |�t| = |t 1 − t 2 | < 50 s.

Methodology
This section describes in detail the algorithms used to derive the following Swarm GVO products: 1) 1-and 4-monthly time series of the 'Observed Field' 2) 1-and 4-monthly time series of the 'Core Field' and its secular variation.
Each product involves time series of spherical polar components of the vector magnetic field on an approximately equal-area global grid of 300 locations at an altitude of 490 km above the mean Earth radius. An overview of the algorithm is presented in Fig. 2.

GVO locations and timestamps
For a given GVO target point, and considering a specified time window of either 1 or 4 months, input data that fall within a cylinder of horizontal radius r cyl = 700 km around the target point, and which also satisfy the relevant selection criteria (see Sect. "Data"), are extracted. The GVO locations are specified in spherical polar coordinates r GVO = (r, θ , φ) , at fixed radius r = r a + h GVO where h GVO is the height above the Earth's mean spherical radius r a = 6371.2 km. For the Swarm data described h GVO = 490 km, so the GVOs are located at approximately the mean orbital height of the Swarm satellites during 2013-2020, considering each of the lower pair to contribute with half weighting. The GVO time series are provided in a global approximately equal area grid based on the sphere partitioning algorithm of Leopardi (2006). Selecting a number of GVO grid points, and an associated target cylinder search radius r cyl that avoids overlap of the target cylinders to ensure independent data, involves a trade-off; decreasing the number of target points and increasing the search radius allows for more data within each GVO cylinder but at the same time lowers the spatial resolution. Preliminary tests with Swarm data suggested that 300 GVO grid locations provided a suitable balance (Hammer 2018). If higher spatial resolution is required, longer time windows than used here are necessary in order to obtain stable GVO estimates. The surface area dS covered by each GVO target cylinder is the total surface area A divided by the number of GVOs, N GVO = 300 , i.e., dS ∼ A/N GVO = 4π(r a + h GVO ) 2 /N GVO , where r a is the Earth's mean radius 6371.2 km and h GVO is altitude of GVOs, here 490 km. Equating this area with the area of a Fig. 1 Illustration of the Geomagnetic Virtual Observatory concept; satellite magnetic measurements from within a target cylinder are used to infer field time series at the GVO location given by a red dot. Note the cylinder radius is not to scale, the actual cylinder footprints are shown in Fig. 3 circle surrounding the GVO, π r 2 cyl , gives a target cylinder search radius of r cyl = 4(r a + h GVO ) 2 /N GVO ≈ 700km , where we have rounded down to the nearest hundred kilometers to ensure no overlap. The distance between any two GVOs is thus ≈ 1400km. This corresponds roughly to SH degree n = 14 , since the SH degree n is associated with a horizontal wavelength at satellite altitude is n ∼ 2π(r a + h GVO )/ √ n(n + 1) (Backus et al. 1996). With a target cylinder search radius of 700km, approximately 80% of the data are used; the combined area of the cylinder footprints thus does not span the entire area of the spherical surface, but the independence of each GVO estimate is ensured. The top panel in Fig. 3 illustrates the locations of the 300 globally distributed GVOs and the footprint of the data target cylinders for each GVO. The grid also contains GVOs at the North and South Poles. At these positions the (r, θ, φ) frame is defined by letting θ be aligned along the Greenwich meridian, r point upwards and φ completes the right-handed coordinate system. When computing the main field at the North/South Pole from field models, the average of the main field values evaluated 0.1 • in latitude from the North/South Pole at longitudes 0 • and 180 • was used.
For the 1-monthly series each GVO estimate has a timestamp corresponding to the 15th day of the considered calendar month. For the 4-monthly series, which are constructed using data from within the intervals January-April, May-August, and September-December, the GVO estimates have been allocated timestamps of the 1st of March, the 1st of July and the 1st of November. The secular variation series are computed from annual differences of the 1-monthly and 4-monthly series, so their timestamps are shifted by 6 months compared with the field series. GVO epoch times are for formal reasons given in GVO product files as milliseconds since 01-Jan-0000 00:00:00.000, following the convention for Swarm data products.

Data pre-processing
In order to derive the Observed Field GVO time series, we start from the geocentric spherical polar components of the vector magnetic field, B obs = (B r , B θ , B φ ) , and subtract predictions, B MF , from the IGRF main field model (Alken et al. 2020b) for spherical harmonic degrees n ∈ [1,13] . This results in the following Observed Field residuals These residuals are used to derive the GVO estimates as described in Sect. "GVO model parameterization and estimation" below. Note that IGRF predictions at the GVO target points and times are added back during this procedure.
In order to derive 1-monthly Core Field GVO time series from data chain (a), predictions from a lithospheric field model B lith are also removed: Here, we calculate B lith using SH degrees n ∈ [14, 185] from the LCS-1 model ).
(2) δB core,1month = B obs − B MF − B lith . To derive 4-monthly core field GVO time series from data chain (b), models of the magnetospheric and ionospheric fields are also removed during the pre-processing: where B mag is the predicted large-scale magnetospheric field and its Earth-induced counterpart field, as given by the CHAOS model (Finlay et al. 2020), and B iono is the predicted ionospheric field and its Earth-induced field as given by the CI model (Sabaka et al. 2018).
Estimates of the main field are in all cases removed from the data before carrying out the GVO estimation, and then afterwards added back at the target location and time; this step is necessary in order to pre-whiten the data before carrying out the GVOestimation. Previous studies have shown that such pre-whitening by removing a main field model is necessary in order to avoid noisy GVO estimates. Hammer (2018, p.74, Fig 4.6) presents examples of GVO series computed with and without pre-whitening applied. Without pre-whitening robust estimation schemes, based on iteratively reweighted least squares, are unable to correctly identify and downweight disturbed data resulting in noisier estimates. The specific main field model used for the pre-whitening is however not crucial. For example, comparing GVO estimates constructed using IGRF-13 and the CHAOS-7.2 main field model, we found rms differences compared to the benchmark ground observatories described in Sect. "Validation tests" that agreed to within 0.05 nT across all three components, for both 4-monthly and 1-monthly GVO estimates. We therefore chose to use IGRF-13 for the pre-whitening step in for producing the Swarm GVO series in order to emphasize that the results obtained are not simply a result of biasing towards the CHAOS model.
Note that when considering time windows of 1 or 4 months for the GVO estimates, any information on time variations with periods shorter than these intervals is, of course, lost.

GVO model parameterization and estimation A Cartesian potential forward model
We assume that magnetic field measurements are made in a source-free region such that the residual magnetic field is a Laplacian potential field, which fulfills the quasistationary approximation (Backus et al. 1996). In the following, we will use the general notation δB for the residual fields of Eqs. (1, 2, 3) and refer to the position of the Geomagnetic Virtual Observatory as the target location. (3) The residual magnetic field and the associated locations within a specific target cylinder are transformed from the spherical coordinate system to a right-handed local topocentric Cartesian frame centered on (and constant within) each target cylinder, where at the GVO location x points towards geographic south, y points towards East and z points upwards. The bottom panel in Fig. 3 illustrates the geocentric spherical and local topocentric frames. Note that the unit vectors of the local Cartesian frame, (ê x ,ê y ,ê z ) , coincide with the spherical unit vectors, (ê θ ,ê φ ,ê r ) , at the target location but not elsewhere.
The magnetic scalar potential, V, associated with the residual magnetic field in a source-free region must satisfy Laplace's equation ∇ 2 V = 0 and the potential is related to the residual field by δB = −∇V . The solution to Laplace's equation in Cartesian coordinates can be written as a sum of harmonic polynomials (e.g., Backus et al. 1996)  where l = a + b + c , and C abc are the expansion coefficients, a, b, c are non-negative integers, and L is the expansion order. In tests we found that for a GVO cylinder radius of 700 km, it was sufficient to expand the magnetic scalar potential to cubic order L = 3 ; this involves the 19 parameters in the above expansion.
To be a valid potential field requires irrotational ( ∇ × δB = 0 ) and solenoidal ( ∇ · δB = 0 ) conditions be satisfied. First we consider the solenoidal divergence-free criteria. This requires ∇ 2 V = 0 which on inserting for the potential from Eq. (4) implies Each of the terms in parenthesis must equal zero for this to be satisfied. This means that The cubic potential series is thereby reduced by 4 parameters to a total of 15 parameters The potential Eq. (6) also fulfills the curl-free criteria.
With an expansion for the magnetic potential established, we can now write a linear forward problem relating a vector m , consisting of the model coefficients C abc , to a vector d vec of predictions for the residual magnetic field components in the local Cartesian system, (δB x , δB y , δB z ) , at the positions of satellite observations that fall with the GVO target cylinder for the time window under consideration, Here G vec is then a design matrix constructed from appropriate spatial derivatives of the potential. Olsen et al. (2015) and Sabaka et al. (2018) have demonstrated that using East-West differences (between Swarm A and C) and along-track differences (for all three Swarm satellites) improves the resolution of both the lithospheric field and the core field. They argued that correlated errors due to incompletely modeled largescale magnetospheric fields are reduced when using such field differences. In addition to field differences, East-West and along-track sums of the measurements also need to be included in order to adequately constrain the largest wavelength parts of the field (Hammer 2018; Sabaka et al. 2013). Based on this, we have chosen to use sums and differences of the vector components of the residual magnetic field, constructed along satellite tracks and considering East-West pairs of data between Swarm A and C. This results in a data vector and denote the differences and sums of the residual field described in Sect. "Data". The design matrix linking the sums and differences to the coefficients of the potential is constructed as

Robust least squares estimation
Based on the above definitions of d and G for sums and differences of the residual magnetic field, the coefficients of the GVO model can be estimated using the following robust least-squares inversion scheme Here W is a diagonal weight matrix, consisting of robust (Huber) weights for each entry in the data vector (e.g., Constable 1988), and an additional down-weighting factor of 1/2 for data from satellites Alpha and Charlie which takes into account that these two satellites fly sideby-side and therefore provide similar measurements.
Having determined the potential, estimates of the residual magnetic field components, in local cartesian coordinates, at GVO target location (i.e., x = 0, y = 0, z = 0 ) are computed as follows: At the GVO target location, the local Cartesian field components are directly related to spherical polar field components (see Fig. 3) with δB GVO,r = δB GVO,z , δB GVO,θ = δB GVO,x and δB GVO,φ = δB GVO,y . Each estimate is for a specific target GVO location r and epoch t, which is the center of the considered time window. The above procedure is repeated for each time window at each target location to obtain time series of estimates of the residual vector field at all GVO target locations.
A final step is needed to obtain the GVO estimates for the field. This is to add back the prediction of the main field model B MF GVO (r, t) , at each target point and epoch, using the same model (here IGRF-13) that was removed from each satellite measurement during the pre-processing. This step is carried out separately for each component at each GVO location and each epoch, such that we finally obtain the GVO vector field time series The estimated GVO magnetic field is provided in spherical polar (r, θ , φ) vector components.

Observed Field GVOs
We define 'Observed Field' GVOs as field estimates computed from satellite observations while retaining all observed geomagnetic field sources. Observed Field GVO time series are derived from the sums and differences of the residual field computed using Eq. (1) and then applying the GVO method described by Eqs. (8-10). One-monthly observed field GVOs are computed from data chain a) while 4-monthly observed field GVOs are computed from data chain b).
Error estimates, σ obs , for the Observed Field GVOs are assumed to be time-independent and spatially uncorrelated. They are calculated separately for each GVO times series (i.e., for each field component at each GVO location) based on a robust version of the total mean square error (e.g., Bendat and Piersol 2010), that includes both mean square residual and the mean residual squared, between the input data d i and the GVO estimates d i for a given series. With e i = d i −d i this is calculated as where the index i runs over all data contributing to a given series, µ w = i w i e i / i w i is the robust mean residual and the robust weights w i are calculated iteratively assuming a long-tailed Huber distribution (Constable 1988): with ǫ i = abs(e i )/std(e) and c w = 1.5 is the chosen breakpoint for the Huber distribution.

Core Field GVOs and Secular Variation
We define 'Core Field' GVOs as field estimates computed from satellite observations with non-core fields removed (as far as possible). The core field and associated secular variation (SV) GVO time series are produced as follows.
First, 1-and 4-monthly GVO data files are produced, after which the 1-monthly GVOs are de-noised by a principal component analysis. Next, an epoch-by-epoch spherical harmonic analysis is carried out and the resulting external and toroidal magnetic fields (i.e., non-internal parts) are removed. Finally, annual differences of each series are computed in order to obtain the GVO core field SV time series.
For the 1-monthly Core Field GVOs, GVO estimates are computed from sums and differences of the field residuals using Eq. (2) based on data chain (a) (i.e., without data selection criteria). For the 4-monthly GVOs Core Field GVOs, GVO estimates are computed from sums and differences of the field residuals using Eq. (3) based on data chain (b) (i.e., with dark geomagnetically quiet-time criteria applied).

Principal component analysis
Since the 1-monthly Core Field GVOs were derived without data selection and having no model estimates of the ionosphere nor magnetosphere removed, external magnetic field signals remain. Such signals are considered as contamination ('noise') in the current context because our goal is to produce GVO estimates of the core field only. The monthly sampling rate means that a local time sampling bias also contaminates the GVO estimates, as it takes approximately 4 months for each satellite to revisit the same local time on Earth's surface when considering both ascending and descending orbit tracks (see Shore 2013). To produce 1-monthly Core Field GVOs we therefore employ the principal component analysis (PCA) method and Python package (MagPySV) described in Cox et al. (2018), to separate out and remove the various contaminating signals from the 1-monthly GVO estimates. This procedure is based on earlier work by Wardinski and Holme (2011) and Brown et al. (2013), who used the PCA method to de-noise ground observatory data one observatory at a time, rather than de-noising time series from several locations simultaneously, as in Cox et al. (2018) and this work. A brief summary of this method is provided here; the reader is referred to Cox et al. (2018), Cox et al. (2020) and the companion to this paper Brown et al., in preparation for further details. Domingos et al. (2019) applied PCA to an earlier version of the 4-monthly GVO data series, considering both CHAMP and Swarm measurements. They performed PCA directly on GVO data series, rather than on annual differences of GVO series after subtracting predictions from a core field model as we do. Hence, our PCA analysis looks for coherent signals that remain once features like the large-scale internal variations identified by Domingos et al. (2019) have been removed. Whilst their focus was on modes associated with the variance of internal field, they also identified an interesting mode associated with annual variations of the external field in their Swarm GVO series. Our analysis is not well suited to studying annual variations since we apply PCA to annual difference estimates of SV. After carrying out tests with our PCA procedure we decided there was not much advantage in applying it to the 4-monthly GVOs. Our 4-monthly GVO SV series contain fewer identifiable coherent external signals on which we would apply the PCA. This is due to the dark quiet-time data selection criteria, the applied corrections for magnetospheric and ionospheric fields, and the absence (by design) of the 4-month local time sampling bias that remains in the 1-month series where PCA de-noising is applied.
The key premise to our approach is that the SV residuals (the difference between observed GVO SV and that predicted by an internal magnetic field model) provide information about contaminating signals that are present in the GVO data but not in the internal model. The PCA of the SV residual covariance matrix leads to a proxies for these contaminating signals that are then removed from the GVO data. We approximate the GVO SV series using annual differences and the SV residuals are calculated as the difference between the GVO SV estimates and the SV predicted by the CHAOS-7.2 model (Finlay et al. 2020) evaluated up to SH degree 13 at the same times and locations. Comparable results can be achieved using alternative field models, provided they represent time variation of the main field in a continuous manner when detrending each GVO SV series.
In their application to ground magnetic data, Cox et al. (2018) found that this method is most effective when considering groups of observatories at similar magnetic latitudes because the dominant external magnetic field source varies with magnetic latitude. Suitably grouped observatories experience similar noise at the same times and these correlated signals show clearly in the dominant principle components (PCs) of the SV residuals. On that basis, we estimated the mean magnetic latitude at GVO locations using the AACGM-v2 Python package (Burrell et al. 2020;Shepherd 2014) and assigned them to one of five magnetic latitude regions: Polar North, Polar South, Auroral North, Auroral South and Low-to Mid-magnetic latitudes (see Table 1).
For N GVO locations, the SV residual covariance matrix for the vector time series is 3N by 3N, and can be decomposed into 3N eigenvalues and eigenvectors, describing the PCs of the SV residual data set. The contributions of the K dominant PCs, corresponding to the K largest eigenvalues, are removed from the SV residuals, and afterwards the internal model SV from the CHAOS-7-2 model is added back to the corrected residuals to form the de-noised SV.
In this application, we remove the most significant K PCs entirely, as opposed to removing the scalar projection of a proxy signal for the PC content as described in Cox et al. (2018) and earlier related works. Our removal of PCs here involves the removal of the associated eigenvectors and the component of signal at each GVO location in the projected directions of these eigenvectors. Note the number K differs by region, depending on how many PCs can be confidently identified as arising from one of the expected contaminating sources described above.
We identify PCs as noise sources based on their geographic distributions, correlations to annual differences of external magnetic field proxies (e.g., Dst, Polar Cap North/South, Em, AE (Kauristie et al. 2017)), or peaks in their discrete Fourier transform (DFT) at the local time bias frequency. Table 1 gives the number of PCs identified as noise, along with the percentage of variance in the SV residuals accounted for by each of these PCs and the total percentage variance removed in each region.
In a last step, the de-noised SV are numerically integrated to produce de-noised one-monthly magnetic field time series, again treating SV as annual differences. The de-noised magnetic field must be re-leveled at the start of this calculation. We use the original GVO field values for the first 12 time samples for this purpose, meaning that the de-noised field values start 12 months after the original GVO time series begins.

Spherical harmonic analysis
The magnetic field time series produced by the GVO method assumes a potential field description. This implies that no electrical currents exist within the measurement region. In reality however, satellite magnetic measurements are made in the ionospheric F-region where in situ electrical currents may be present (Olsen 1997;Sabaka et al. 2010), especially in the auroral regions. Due to space-time aliasing, these non-potential fields can leak into the GVO estimates (Olsen and Mandea 2007).
In the situation of non-vanishing, but purely radial, currents within the shell of measurements the magnetic field can written in terms of poloidal, V int , V ext , toroidal, T sh , and scalar potentials (e.g., Backus 1986;Olsen 1997;Olsen and Mandea 2007): where each of the potentials can be represented by expansions up to some maximum SH degree N: where r a = 6371.2 km is the reference value for the Earth's mean spherical radius, n and m are here the SH degree and order, respectively, and P m n are the associated Schmidt semi-normalized Legendre functions. In the three expansions, {g m n , h m n } are the internal coefficients, {q m n , s m n } are the external coefficients and {t m,c n , t m,s n } are the expansion coefficients associated with the toroidal scalar potential.
Predictions of the geomagnetic field components at the GVO locations are linearly related to the above expansion coefficients such that a forward problem can be written where the data for a given epoch, t are given by  r 1 , t), ..., B θ (r N GVO , t), B φ (r 1 , t), ..., B φ (r N GVO , t)} , where N GVO is the number of GVOs, related to the expansion coefficients m SH = {g m n , h m n , q m n , s m n , t m,c n , t m,s n } via a design matrix G SH , which is constructed from the spatial derivatives of Eqs. (14, 15 and 16). Here, we truncated the internal, external and toroidal expansions at SH degree 13 and the model coefficients were determined epoch by epoch from the GVO estimates using a simple least-squares solution: At epochs where an insufficient number of GVOs are available to ensure a stable solution, the external and toroidal coefficients were determined by a linear interpolation between nearby epochs. Following the SHA, external and toroidal field estimates are removed epoch by epoch from the 1-and 4-monthly time series to produce final Core Field GVO time series.

Secular variation estimates
The secular variation of the Core Field series at a particular GVO location, r , for a given epoch t, is computed using annual differences between field values at time t + 0.5 yr and at time t − 0.5 yr: SV GVO (r, t) = B GVO (r, t + 0.5 yr) − B GVO (r, t − 0.5 yr). Annual differences are a well established way to estimate the core field secular variation since they remove annual signals from ionospheric and magnetospheric signals that are otherwise difficult to isolate. Note however that such annual signals do remain in the GVO field series themselves.

Error estimates
The error estimates, σ core , for each Core Field GVO time series are assumed to be time-independent and spatially uncorrelated. They are computed separately for each field component at each GVO based on the residuals between the GVO data and the corresponding predictions of the time-dependent internal part of the CHAOS field model for SH degrees n ∈ [1, 20] . Denoting the residuals by e = d GVO − d CHAOS the error estimates are given by where i = 1, ..., M denotes the ith data element, and M is the number of data in a given series and µ is the residual mean.
Error estimates of the secular variation GVO time series are computed in a similar manner as described above but using residuals between the SV GVO data, SV GVO , and the SV predictions of the CHAOS timedependent internal field model.

Comparison of GVO series with ground magnetic observatories
Validation tests were performed by comparing the GVOs and independent ground observatory (GObs) records, which are the established standard reference data series for monitoring long-term variations of the geomagnetic field. Our validation tests considered data from 28 INTERMAGNET (International Real-time Magnetic Observatory Network) ground observatories, listed in Table 2. These were chosen for their representative geographic coverage, spanning both polar and non-polar latitudes and all longitude sectors. Below we refer to polar stations as being the 13 stations with colatitudes 0 • to 36 • and 144 • to 180 • , with the remaining 15 stations referred to as non-polar stations. From these stations we further selected six 'benchmark' stations (Chambon la Forêt, Kakioka, Honolulu, Guam, Hermanus and Canberra) from mid-to-low latitudes that are well known for their high quality. We use these in an attempt to establish, in well-understood conditions, the extent to which Swarm GVO series agree with ground records, with an emphasis on how well the core field secular variation is captured.
We used the Swarm AUX_OBS_2_ hourly mean ground observatory dataset, version 0122 from February 2020, maintained by the British Geological Survey (BGS), retrieved from ftp://ftp.nerc-murch ison.ac.uk/geoma g/ Swarm /AUX_OBS. These data have been checked and corrected for known baseline jumps (Macmillan and Olsen 2013). From these hourly mean values for each selected observatory we compute: (i) One-monthly and four-monthly simple mean field values, for each of the three spherical polar components. These are used for comparisons with the Observed Field GVO products. (ii) One-monthly and four-monthly versions of revised means (Olsen et al. 2014), wherein the CHAOS magnetospheric field (Finlay et al. 2016) and CM4 ionospheric field predictions (Sabaka et al. 2004) (and their induced counterparts) are first removed from the hourly means for each of the three spherical polar field components and then robust (Huber-weighted) means are computed over 1-or 4-monthly non-overlapping windows. These series are used for comparisons with the Core Field and Secular Variation GVO products.
To enable direct comparisons with these ground observatory series, we computed dedicated GVO time series directly above each selected ground observatory, using the approach described in the Sect. "GVO model parameterization and estimation". We removed crustal bias estimates from each series (computed as the median residual from the CHAOS-7.2 internal field model to up SH degree 16) and mapped the GVO estimates downwards to the position of the ground observatory at Earth's surface by removing the difference between CHAOS-7.2 model predictions at the GVO location and the ground observatory location. This results in series we refer to as B GObs j (t i ) for the ground observatories and B GVO,map j (t i ) for the GVOs, respectively, both at the ground observatory location. The subscript j indicates either the r, θ or φ component, or the scalar field intensity F (computed by taking the square-root of the sum of the squares of the three spherical polar components). The root-meansquare (rms) deviation between the correspond ground observatory and GVO series was then computed as where the summation runs over the length of the time series i = 1, ..., N d where data are available from both series. The rms differences for secular variation series are computed in the same fashion, using annual differences (21) of the ground observatory field, B GObs j (t i ) , and Core Field GVOs mapped to the ground observatory positions, B GVO,map j (t i ) . We computed summary means over these rms values for groups of series from the polar regions, the non-polar region and benchmark observatories. For these tests we used the time interval 2015-2018 when there is good availability of both definitive observatory data and Swarm data.
We note that despite being the best available information concerning secular variation, the ground observatory records are themselves inherently imperfect. INTER-MAGNET standards require that long-term accuracy of main field series be better than 5 nT, with the best observatories having an estimated baseline uncertainty of up to 0.4 nT (Lesur et al. 2017). Beyond observatory measurement uncertainties, a further source of differences between ground observatory data and GVO estimates is that the latter use data above the ionospheric E-layer, while ground data are collected at the Earth's surface. They therefore observe ionospheric and magnetosphereionosphere coupling currents differently. Our potential field mapping used to downward continue the GVO estimates to Earth's surface does not account for this difference, and so it is a source of discrepancy between the two series, particularly for the horizontal components.

Comparisons of GVO series with field model predictions
A second set of validation tests involved comparisons between the GVO products and predictions from geomagnetic field models. These have the advantage that the GVO product, provided on a global grid, can be tested directly (without any mapping) and the global quality of the products can be assessed. However, unlike the comparisons with ground observatories, tests against field model predictions are not fully independent as Swarm data were also used in the construction of the field models.
Comparisons to models are based on the rms deviation between a given GVO time series as well as the CHAOS-7.2 magnetospheric field (and induced counterparts) and the CIY4 ionospheric field (Sabaka et al. 2018) (and induced counterparts). The magnetospheric and ionospheric fields and their counterparts are computed as mean values for each 1-monthly or 4-monthly time window, considering the times of the actual data used to derive the GVO estimates. We note the model values compared to the Observed Field GVOs are not fully representative of all the fields contributing to the GVOs, in particular they do not include realistic ionospheric fields in the polar region, or magnetosphereionosphere coupling currents.
For comparisons with the Core Field GVOs, B mod j (t i ) is computed using the time-dependent internal field from the CHAOS-7.2 model (Finlay et al. 2020) using SH degrees up to 20, with the LCS-1 lithospheric model  for degrees n ∈ [14, 20] removed. For comparisons with the Core Field Secular Variation  GVOs, B mod j (t i ) is computed using the first time derivative of the time-dependent internal field from CHAOS-7.2, again up to SH degree 20. In the global grid there are 78 polar and 222 non-polar GVOs and Benchmark values were computed using GVOs ±30 • in latitude from the equator. Comparisons were made between 2014 and 2020, throughout the time interval when GVO data were available.

A global overview of the Swarm GVO time series
To illustrate the 1-monthly GVO secular variation data series, Fig. 4 presents a global map of annual differences of the radial field component of the Observed Field GVO time series (blue dots) and of the Core field GVO time series (red dots). Fig. 5 presents a similar summary of the global results for the 4-monthly GVO time series. Note the small difference in the time scales shown at the bottom left of these two figures; the SV of the 1-monthly GVO-CORE time series begins in 2015.5, since the GVO-CORE time series starts only in 2015 due to the PCA processing, while the SV of the 4-monthly GVO-CORE begins in 2014.7 since no PCA is not performed on these.

Validation statistics: comparisons with ground observatories and field models
The results of the validation comparisons carried out are presented here in the form of two summary tables of statistics. Table 3 collects results of the validation tests against independent ground observatories and field models for the 1-monthly GVO products, while Table 4 collects similar statistics for 4-monthly GVO products. See Sect. "Validation tests" above for details of the tests.  When considering the statistics presented here, it is important to recall that the number of ground observatories is split into 13 "Polar" stations, 15 "non-polar" stations and six "benchmark" stations. As mentioned in Sect. "Validation tests" the stations in each category were selected in order to obtain as far as possible reasonable geographic coverage of both the polar and non-polar regions. The aim with the benchmark stations was to document and validate the performance of the GVO time at known high-quality stations from mid-to-low latitudes where external contributions are less prominent. The error estimates provided along with the GVO products are also presented in these tables for reference. In these tables GVO-OBS, GVO-CORE and GVO-SV denotes the Observed Field GVOs, the Core Field GVOs and the Core Field Secular Variation GVOs, respectively.

Example comparisons of GVO and ground observatory time series
More detailed insight comes from direct examination of the time series of the ground observatory and associated GVO series as described in Section "Validation tests". Fig. 6 presents the 1-monthly Observed Field (GVO-OBS, blue dots) and Core Field (GVO-CORE, red dots) GVO estimates, mapped down to the Earth's surface at three of the benchmark ground observatories. These figures include ±σ uncertainty estimates, where we have made the assumption that these estimates remain unchanged when mapping the field to ground level. When examining the Observed Field GVOs we present time series of the field itself rather than the SV, so as not to filter out any signals that may be of interest by taking annual differences. Also plotted for comparison are the ground observatory hourly monthly means (omm, yellow dots) and revised monthly means (rmm, black dots).
Radial field variations observed at the benchmark stations are followed closely by the GVO series, for example at Kakioka (KAK) in Japan (left column, Fig 6) where both the trend in the field and its acceleration are in agreement. The ability of the Observed Field GVO series to track sub-annual field changes is illustrated by the southward θ-component, for example the peak observed in the second half of 2017 at Kakioka. This feature, likely of magnetospheric origin, is seen simultaneously at all benchmark stations in both the GVO and ground observatory series, and is particularly clear at Kakioka (KAK) and Hermanus (not shown here). The amplitude of the peak is slightly lower in the GVO series. More scatter Fig. 6 One-monthly Observed Field (blue dots) and Core Field (red dots) GVOs mapped to Earth's surfaced with ±σ uncertainty envolopes, together with simple monthly means (yellow stars) and revised monthly means (black stars) from three of the selected high-quality 'benchmark' ground observatories, left column: Kakioka (Japan), middle column: Honolulu (Hawaii,USA), right column: Canberra (Australia). Top row is the radial field component, middle row is the southward field component, bottom row is the eastward field component, units are nT is seen in the eastward φ-component of the GVO series compared to the ground observatory benchmark series (e.g., at Honolulu, HON). The source of this scatter may be ionospheric or field-aligned currents seen by the satellites that are less prominent at ground; the amplitude of this scatter was larger 2014-2016, which may indicate a solar cycle dependence. Figure 7 presents Observed Field (GVO-OBS) and Core Field (GVO-CORE) GVO estimates along with their ±σ uncertainty, together with corresponding ordinary and revised ground observatory monthly means, from stations in the more challenging polar regions. At these locations, there are strong ionospheric E-region currents lying between the satellites and the ground stations, and the satellites at times fly through intense field-aligned currents. Nonetheless, the comparisons are encouraging and the trends seen at the ground stations are well captured by the GVO series. At the polar stations, the amplitude of the error bars has been significantly reduced for the Core Field GVO series compared to the Observed Field GVO series.
The radial component at high northern latitudes in Canada, at the Resolute Bay observatory (RES) inside the polar cap, shows a particularly clear annual variation in the monthly means, peaking in the northern summer. These fluctuations, which are likely due to far-field effects of polar electrojet currents, are well tracked by the GVO estimates.
Larger differences between the GVO and ground observatory series are seen in the eastward φ-component at these stations, the difference being largest from 2014 to 2017 (up to 25 nT seen at RES in summer months). The eastward φ-component in the GVO and ground stations agrees more closely at slightly lower latitudes in both the northern hemisphere (e.g., in Alaska at College station CMO, not shown here) and in the Southern hemisphere at Macquarie Island (MCQ), middle row Fig. 7. Ground stations in the auroral zone see signals in the southward θ-component that are less prominent in the GVO estimates; these may be caused by polar electrojet currents that are closer to the ground stations. At Mawson observatory in Antarctica (MAW) the southward θ-component has fluctuations of opposite sign to fluctuations seen at the same time in the GVO estimates. The relative position and orientations of the ionospheric currents and the ground and satellite observation points are clearly important for understanding such effects. Fig. 7 One-monthly Observed Field (blue dots) and Core Field (red dots) GVOs mapped to Earth's surfaced with ±σ uncertainty envolopes, together with simple monthly means (yellow stars) and revised monthly means (black stars) from three of the selected polar ground observatories, left column: Resolute Bay (Canada), middle column: Macquarie Island (Australia), right column: Mawson Station (Antarctica). Top row is the radial field component, middle row is the southward field component, bottom row is the eastward field component, units are nT Figure 8 presents plots of the 1-monthly revised monthly mean SV from ground observatories (black dots) and the 1-monthly Core Field GVO SV series (red dots) at the three low/mid benchmark locations. Note the difference in scale here when looking at the secular variation, compared to the earlier plots that show the Observed Field/Core Field GVO values without taking annual differences.
The absolute levels (i.e., amplitude of secular variation) and trends (i.e., secular acceleration) in these benchmark ground observatory records of the core field secular variation are well matched by the Core Field SV GVO series. Peaks (secular variation impulses/geomagnetic jerks) such as that in the radial field at Honolulu (HON) in 2017 (Fig. 8, middle column, top row) are well captured and there is no indication of loss of temporal resolution in these annual difference secular variation series compared to the ground records. This indicates time-dependent SV with time scales down to 1 year is well captured in the Swarm Core Field Secular Variation GVO product. The scatter is slightly larger in the GVO series for the southward θ-component and there are indications of remaining noise (perhaps to due ionospheric or inter-hemispheric field-aligned currents) with period close to one year in the eastward φ-component. Figure 9 shows similar comparisons for a selection of the polar observatories. Here the scatter is larger in both the ground and GVO data, due to the difficult of isolating the core field signal, but again the observed trends agree well. Figures 10 and 11 present plots of the 4-monthly ground observatory SV (black dots) and 4-monthly GVO SV time series (red dots) at the same three low/mid and polar latitude benchmark locations. Considering Fig. 10, the scatter observed in the 1-monthly Core Field SV time series has been reduced and the independent ground and Swarm series show excellent agreement. The peak in the SV observed in the radial component at Honolulu (HON) in 2017 is again well captured. Differences are apparent at some epochs between the GVO series and the ground observatory series in the eastward φ-components, especially in 2015 and 2016 when solar activity was higher. This is particularly noticeable in the 4-monthly SV series in January 2015 and January 2016 and seems to be related to the fields measured by the Swarm satellites during summer 2015 (see e.g., Fig. 10). Comparisons with ground observatories and internal field models such as CHAOS show a noticeable bias in the B φ component during this period which contributes to longer tails in the Fig. 8 One-monthly Core Field SV GVOs mapped to Earth's surface (red symbols) with ±σ uncertainties, and revised monthly means from selected high-quality 'benchmark' ground observatories (black symbols), left column: Kakioka (Japan), middle column: Honolulu (Hawaii,USA), right column: Canberra (Australia). Top row is the radial field component, middle row is the southward field component, bottom row is the eastward field component, units are nT distribution of residuals for B φ all epochs and also results in enhanced rms differences for the φ-components of SV in comparison to ground observatories, see Table 4. A similar bias is also seen when comparing the original Swarm data to internal field models during summer 2015, particularly for Swarm B. The residuals during this time are largest in the northern polar region and seem to be geophysical in origin, perhaps related to strong fieldaligned currents measured by the satellites during this epoch.
Despite the slightly higher scatter at the polar stations in Fig. 11, the agreement is again encouraging with trends seen at ground stations being captured in the GVOs. Largest differences are seen in the horizontal components for Mawson station (MAW) in Antarctica where a sawtooth pattern about the ground series is visible in the 4-monthly GVO estimates. This enhanced scatter is reflected in the error estimates supplied together with the GVO products, but illustrates that caution is needed when interpreting SV variations on interannual and shorter timescales in the auroral zone. Further work is required to better understand these features.

Discussion and conclusions
In Table 3 find that the 1-monthly Swarm GVO products Observed Field series agree with independent ground observatory and field model predictions to within 5 nT in all components at non-polar latitudes. Given that the requirement for a good standard (INTERMAGNET) ground observatory is an accuracy of 5 nT this indicates that the GVO method yields results comparable on these time scales with good ground observatories. The 4-monthly estimates agree even better, to within 3 nT. Larger differences are found at polar latitudes where comparisons are complicated by the presence of strong ionospheric and magnetosphere-ionosphere coupling currents that have different signatures at ground and satellite altitude. The processing applied to obtain Core Field GVOs results in close agreement with ground observatory revised monthly means and with internal field models.
Taking annual differences to obtain SV estimates, further improves the agreement. We find the secular variation of the field intensity in the 1-monthly Core Field GVOs agrees with six benchmark ground observatories Fig. 9 One-monthly Core Field SV GVOs mapped to Earth's surface (red symbols) with ±σ uncertainties, and 1-monthly revised monthly means from selected polar ground observatories (black symbols), left column: Resolute Bay (Canada), middle column: Macquarie Island (Australia), right column: Mawson Station (Antarctica). Top row is the radial field component, middle row is the southward field component, bottom row is the eastward field component, units are nT from mid and low latitudes to a level of 1.8 nT/yr. For the 4-monthly Core Field GVOs the difference to the secular variation recorded at the ground observatories decreases to 1.2 nT/yr. These numbers may be considered an upper bound on the accuracy of the Swarm GVO secular variation estimates, since they also include the measurement errors inherent in the ground observatories (perhaps 0.5 nT/yr at excellent observatories) as well as differences due to incomplete separation of non-core sources which will affect ground and GVO data in different ways.
In this paper, we have presented a global network of Geomagnetic Virtual Observatories constructed from vector magnetic field measurements made by the Swarm satellites. The series are provided in two variants, each with 1-monthly and 4-monthly cadences, and each with associated uncertainty estimates: (1) 'Observed' magnetic field GVO series, with 1-and 4-month cadence (2) 'Core' magnetic field GVO series, and associated annual difference secular variation series, with 1-and 4-month cadence.
Good agreement has been demonstrated between the Swarm GVO series, ground observatory data, and existing field models. The Swarm GVO series thus provide consistent and accurate global information on geomagnetic secular variation.
We recommend the Core Field GVOs along with their supplied error estimates for use in studies of core dynamics. Adopting the traditional approach of taking annual differences to obtain the SV helps avoid small annual signals that can remain in the Core Field series. For future work, we propose carrying out PCA de-noising based on first differences of monthly GVOs, rather than annual differences, as a promising direction to further isolate core field signal.
Earlier versions of GVO series have already been used in inversions for the core surface flow (Kloss and Finlay 2019;Whaler and Beggan 2015) and in data assimilation studies where the core field signals seen in GVOs are combined with information from geodynamo models in order to estimate the state of the core (Barrois et al. 2018). GVO series are particularly well suited for global studies of rapid core dynamics where a number Fig. 10 Four-monthly Core Field SV GVOs mapped to Earth's surface (red symbols) with ±σ uncertainties, and 4-monthly revised means from selected high-quality 'benchmark' ground observatories (black symbols), left column: Kakioka (Japan), middle column: Honolulu (Hawaii,USA), right column: Canberra (Australia). Top row is the radial field component, middle row is the southward field component, bottom row is the eastward field component, units are nT of physical hypotheses are currently under exploration (Aubert and Finlay 2019;Buffett and Matsui 2019;Gerick et al. 2020). The Observed Field GVOs provide additional information on long-period variations of magnetospheric and ionsospheric origin. Long-period magnetospheric variations may prove useful for deep electromagnetic induction studies (e.g. Harwood and Malin 1977). At high latitudes signatures of the polar electrojets are clearly seen in the 1-monthly Observed Field GVO series, for example the distinctive annual variations in the vertical component seen in Figure 7, reflecting seasonal variations of the polar electrojet current system. Both applications will become increasingly attractive as the time series provided by the Swarm satellites lengthens.