The Kalmag model as a candidate for IGRF-13

We present a new model of the geomagnetic field spanning the last 20 years and called Kalmag. Deriving from the assimilation of CHAMP and Swarm vector field measurements, it separates the different contributions to the observable field through parameterized prior covariance matrices. To make the inverse problem numerically feasible, it has been sequentialized in time through the combination of a Kalman filter and a smoothing algorithm. The model provides reliable estimates of past, present and future mean fields and associated uncertainties. The version presented here is an update of our IGRF candidates; the amount of assimilated data has been doubled and the considered time window has been extended from [2000.5, 2019.74] to [2000.5, 2020.33].


Introduction
The Earth's magnetic field has different sources. Classically, we distinguish internal and external sources below and above the site where the field is measured. The three principal internal contributions are the core field, the lithospheric field and the magnetic fields induced in the ocean or within the crust and upper mantle. At large scales, the core field clearly dominates. Sustained by dynamo action in the liquid outer core, it is predominantly dipolar, and varies on timescales ranging from months to millennia. Magnetic field reversals, rare events on even longer times scales, are not considered here. On scales that correspond to spherical harmonics (SH) degree beyond about 16, the core field is dominated by the lithospheric field coming from the magnetized rocks of the crust. Since the Earth's mantle evolves extremely slowly, the lithospheric field, can be considered as almost static. The external fields that are generated by electrical currents in the ionosphere and in the magnetosphere, on the other hand can vary extremely rapidly in time. Because the sources of the magnetospheric field (the ring current, the magnetopause and magnetotail currents) are distant from the Earth, only its large-scale contributions can be detected by the low-orbiting magnetic satellites or by measurements on Earth's surface. This is not the case for the ionospheric field which is generated closer to the Earth's surface. Because the dynamical behavior of both ionospheric and magnetospheric fields is controlled by solar radiations (and also thermospheric winds for the former), they are closely tied to solar activity, and can vary on very short timescales. The external fields can therefore also induce a non-negligible secondary field inside the electrically conducting parts of the mantle, crust and oceans. Other potentially important induced fields are created because the oceans move relative to the core fields.
Disentangling the different field contributions is a difficult task since they often overlap in spatial scale and time scale. Many field models therefore resort to a regularization in space and time and only use selected data. Some examples are the CHAOS model series by Olsen et al. (2006) and Finlay et al. (2016), the comprehensive models by Sabaka et al. (2002Sabaka et al. ( , 2015Sabaka et al. ( , 2018Sabaka et al. ( , 2020, the GRIMM models by Lesur et al. (2008Lesur et al. ( , 2010Lesur et al. ( , 2015, the POMME models by Maus et al. (2005Maus et al. ( , 2010, or the gufm1 by Jackson et al. (2000). The COV-OBS model by Gillet et al. (2013) and the model recently proposed by Ropp et al. (2020) are the only models that use a Bayesian approach instead of regularization. Usually only vector field measurements taken during night time, under geomagnetic quiet conditions, and at low-to mid-magnetic latitude are considered in order to minimize the contribution of fields created by currents in the ionosphere or in auroral regions (field-aligned currents, DP2, auroral electrojet). Four main contributions then remain, the core, the lithospheric, the magnetospheric and the induced fields. The core and the lithospheric fields are usually treated as one internal source described by one set of spherical harmonics coefficients. Small-scale contributions beyond degree 16 or so are supposed to be of lithospheric origin and are static in time while the larger scales represent the varying core field. Most of the models also treat the induced fields and the magnetospheric field in a simplified way. The field induced by ocean circulation and the fields created in the magnetotail and the magnetopause are either neglected or estimated separately. Only the magnetic field generated by the ring current and the respective induced part then remain to be modeled. Yet, when assuming a 1d electrical conductivity profile for Earth's mantle, the axisymmetric magnetospheric field and the related induced field can be parameterized by the disturbance short-time (Dst) index proposed by Sugiuara (1963). The Dst or other similar indices are independently estimated from observations and can serve as model input.
To achieve an optimal separation of the different contributions, proper temporal parametrization of the different sources is mandatory. In many models (CHAOS, GRIMM, CM, COV-OBS), the time dependency of the core field is modeled by B-splines for an a priorly fixed time step. This imposes a relatively smooth evolution of the field, excluding the rapid variations attributed to external fields. In addition, with algorithms based on regularized least square approaches, only the time derivatives of the core field are penalized and no constraints on the morphology of the field itself are imposed. With the COV-OBS model, Gillet et al. (2013) went a step further in characterizing a priori the spatio-temporal behavior of the core field. They assumed that its dynamical evolution was controlled by a specific second-order autoregressive process which can reproduce the temporal statistical properties of the core field which have been characterized with both observatory measurements (see De Santis et al. 2003;Lesur et al. 2017) and numerical simulations of the geodynamo (see Bouligand et al. 2016).
Although a good calibration of the temporal constraints is crucial for deriving magnetic field models from observatory and satellite data, some key information can also be extracted from the morphology and the spatial correlation structure of the different fields. Holschneider et al. (2016) have shown that the use of appropriate parameterized correlation kernels, could greatly improve the separation of the different components of the Earth's magnetic field. Working with observatory data for a single epoch, they could detect the spatial signature of the core, lithospheric, magnetospheric and ionospheric fields. Their Bayesian approach even allowed the quantification of uncertainties.
The Kalmag model we propose here combines such a technique with the sophisticated temporal correlation functions introduced by Gillet et al. (2013). Since groundbased observatories and satellite missions such as Oersted, SAC-C, CHAMP, or Swarm, have produced or are still generating a huge amount of data, a block inversion would be numerically impossible. This is why we decided to assimilate the data sequentially using a Kalman filter approach combined with a smoothing algorithm.
The article is organized as follows: in the next section, the data selection criteria and the modeling strategy are detailed. We first present the different magnetic sources that are taken into account, we then show how they are a priori characterized, and how such prior information can be modeled through autoregressive processes. Based on these processes, the equations for the Kalman filter approach and the smoothing algorithm are then given. Finally, the methodology used to derive the different candidate models for IGRF-13 is explained. In "Results and discussion" section, we present and discuss the outcomes of the model. However, since the spatio-temporal prior characterization of each modeled magnetic source is parameterized, we first show how these parameters are evaluated to be then incorporated in the model. The article ends with some concluding remarks and perspectives to improve the Kalmag model.

Data
For the moment, the Kalmag model only uses the vector field measurements of the CHAMP and Swarm loworbiting satellites. We sample CHAMP data between 2000.59 and 2010.63 at a rate of 1 datum every 5 s and only use measurements where the vector field magnetometer (VFM) and the star tracker (STR) instruments were functioning nominally. For the Swarm constellation, we consider from 2013.8 the level 1B data from only the Alpha and Bravo satellites with baseline version 0505/0506, using simultaneous sampling every 10 s. For the construction of the IGRF-13 candidate models, which we will refer to as the Kalmag candidates, a two times lower sampling rate was used, impacting marginally the accuracy of the model. Furthermore, we only used data up to 2019.74 for the Kalmag candidates, but now extended this to 2020.33.
Contrary to polar latitudes, where every vector field measurement is considered, for latitudes between 60 • north and south, only night-time data (when the sun is below the horizon), are kept. Furthermore, independently of the satellites locations, the following selection criteria are also applied: • The z-component of the interplanetary magnetic field (IMF) is positive in order to minimize the influence of the substorm auroral electrojet. • The index K p ≤ 2 0 to operate under conditions of low geomagnetic activity.
All in all the dataset is composed of 2 985 442 vector field measurements for CHAMP and 4 606 159 for Swarm.

Magnetic sources
The different contributions to the observations are described in terms of magnetic sources of either internal or external origin. Except for the field produced by fieldaligned currents ( b fac ), each of these contributions b i is derived from a potential V i : The field-aligned currents (FAC), which are mostly flowing in the polar region of the ionosphere, are generating a toroidal magnetic field. Following the development of Waters et al. (2001), we assume that the currents are poloidal. The resulting magnetic field can therefore be expressed through the potential V fac as: Note that depending on the source, the spherical coordinate system {r, θ s , φ s } the magnetic field is expressed in may differ. Here four types of systems are used: geographic (GEO), magnetic (MAG), solar magnetic (SM), and geocentric solar magnetospheric (GSM) (see Laundal and Richmond 2017). Each potential V i , including V fac , the potential associated with the FAC, is then expanded in spherical harmonics (SH), which for internal and external sources, respectively, read: The Y ℓ,m are Schmidt semi-normalized spherical harmonics of degree ℓ and order m, ℓ max is the maximum of degree expansion, a i is a reference radius, and g i,ℓ,m (t) (later referred as g i ) are the spherical harmonics (2) � b fac (r, θ s , φ s , t) = −� r × � ∇V fac (r, θ s , φ s , t) . (3) coefficients expressed at a i . m is the maximum order considered for the spherical harmonics expansion. A complete expansion, referred as standard, requires m = ℓ . However, some sources, in particular external fields, are known to have a strong zonal signature (see Finlay et al. 2017), and are therefore restricted to either zonal spherical harmonics modes with m = 0 or to an expansion we refer to "zonal iso" where m = 1. The Kalmag model is composed of 7 sources. 3 of them are of internal origin, the core field ( g c ), the lithospheric field ( g l ) and the induced/residual ionospheric field ( g ii ). g c and g l are expressed in the geographic coordinate system and expanded in SH with the standard decomposition. g ii is expressed in the solar magnetic coordinate system and its SH decomposition is restricted to m = 1 (zonal iso). Induced and residual ionospheric fields are treated as a single source since the former is known to be mostly dipolar (see Olsen et al. 2005), whereas the latter, although not well known, is likely to be dominant at smaller scales. 3 sources are used to characterize the magnetospheric field. A remote one ( g rm ) in GSM which is purely dipolar and zonal, and 2 close sources ( g m and g fm ) expressed in the SM coordinate system. g m is purely zonal and it is accompanied by g fm a fluctuating source expanded with the zonal iso SH decomposition. Finally, the source associated with field-aligned currents is expressed in the SM coordinate system and restricted to the zonal iso SH expansion. Originally, each external source, including g ii , were expanded up to the arbitrary SH degree ℓ = 15 . It was, however, found in "Parameter estimation" section that g ii required to be modeled at smaller scales, whereas the level of energy associated with g rm was found to be negligible at SH degree ℓ > 1 , leading us to only keep its degree ℓ = 1 . The core field is expanded up to SH degree ℓ = 20 a value where time dependency might still be detected. The resolution of the lithospheric field was set to ℓ = 76 , a degree up to which the prior variance of the field as evaluated in "Parameter estimation" explains well the lithospheric field models of Olsen et al. (2017).
The nature of the 7 sources composing the Kalmag model, the coordinate system they are expressed in, and their spherical harmonics truncation level are listed in Table 1.

Prior characterization of spatial and temporal correlations
To obtain an optimal separation of the various contributions to geomagnetic observations, proper prior characterization of the different magnetic sources is mandatory. Following the studies of Hulot and Le Mouël (1994), Gillet et al. (2013), Holschneider et al. (2016), full spacetime covariance matrices are used to characterize each magnetic source g i . Assuming that E[g i ] = 0 the latter read: where the matrix ∞ g i corresponds to the stationary spatial covariance, and c(�t) is a temporal correlation matrix depending on the time lag t . ∞ g i is assumed to be derived from energy spectra E ∞ i (ℓ, a i ) , expressed at given radii a i , such as: where N m is the number of modeled spherical harmonics coefficients per degree ℓ , and F is the pre-factor of the energy spectra given by S(ℓ) = ℓ + 1 and S(ℓ) = ℓ for internal and external sources, respectively. Two types of spectra are used for the model, flat ones, with , referred as C-based spectra, making Eq. 6 equivalent to the correlation kernels proposed by Holschneider et al. (2016). Only 2 sources are characterized by a flat spectrum, the core field and the induced/residual ionospheric field. This choice was driven by the evaluation we performed in "Parameter estimation" section where such a parametrization enabled us to better explain the data.
Note that the covariance matrices of Eq. 6 are diagonal. More complex covariance structures could be used, accounting for the correlations between different magnetic modes as they appear in dynamo simulations for instance (see Sanchez et al. 2019). However, the large amount of data available for this study are sufficient to properly constrain the model and permit such general prior assumptions. Temporal constraints are prescribed by the type of correlation functions introduced by Gillet et al. (2013Gillet et al. ( , 2015 in the context of geomagnetic modeling. They are derived from the autoregressive processes further discussed below, and read: for first-order processes and: for second-order processes. τ i (ℓ) are scale-dependent characteristic timescales. For the core field, Christensen and Tilgner (2004) and Lhuillier et al. (2011) have shown that its characteristic timescales τ c (ℓ) could be approximated by a power law such as τ c (ℓ) = τ SV ℓ −1 with τ SV the secular variation timescale. In this study, we decided to use such a power law description of τ i for each magnetic source except for the lithospheric field, leading to: with amplitudes M i and exponent α i . In our parametrization of the problem, we therefore use four main parameters to characterize each source: (1) the amplitude A i ; (2) the (virtual) source radius a i ; (3) the time-scale amplitude M i and (4) the time-scale slope α i . In addition, because of the specific behavior of the dipole components, for the core field (see Christensen and Tilgner 2004;Lhuillier et al. 2011) but also for the magnetospheric sources (see Sugiuara 1963;Finlay et al. 2016), the spatial and temporal properties of each source's dipole (except for the lithospheric field) is treated separately from the remaining SH coefficients. These parameters are directly estimated with a subsample of the dataset following the procedure described in "Parameter estimation" section.

Sequentialization
For the following developments, the parameters characterizing the prior covariance structures ( A i , a i , M i and α i ) are assumed to be known. Instead of performing a full Bayesian block inversion with the covariance matrices given by Eq. 5 as a prior information, we proceed in a recursive way through the Kalman filter approach proposed by Kalman (1960). To do so, dynamical equations are required to forecast the statistical properties of the different modeled sources. As previously mentioned, the covariance structures we wish to a priori impose are derived from autoregressive processes. In their

Table 1 Magnetic sources considered in the model
The second column corresponds to the coordinate system each field is expressed in. GEO stands for geographic, SM for solar magnetic, MAG for magnetic and GSM for geocentric solar magnetospheric. ℓ max is the maximum degree of the SH expansion, for the three following types of decomposition: standard with m = [−l, l] , zonal with m = 0 and zonal iso where m = {0, 1, −1}

Source
Coordinate ℓ max SH decomposition where ω i 1 (t) and ω i 2 (t) are Gaussian white noises scaled by the factors σ i 1 (ℓ) and σ i 2 (ℓ) , respectively. These equations have explicit solutions which satisfy: where the temporal Gaussian white noise ξ i is spatially (in terms SH coefficients) characterized by the distribution where t can either be positive or negative.
For magnetic sources characterized by first-order autoregressive processes, z i = g i and F i is given by: The core field evolution is prescribed by a second-order autoregressive process, so the field itself and the secular variation are dynamically tied together. In this case, z i = (g i , ∂ t g i ) T and: Note that the covariance matrix of the core field when the later is assumed to be in a stationary state is given by: as shown by Hulot and Le Mouël (1994).

Sequential assimilation
The Kalmag model consists in a vector z containing the spherical harmonics coefficients of every magnetic source (including the SH expansion of the secular variation). With the decomposition detailed in "Magnetic sources" section, z contains 6624 SH coefficient entries. Its evaluation is performed with a Kalman filter algorithm which proceeds sequentially in two steps. In the first step, the forecast, the evolution of the mean model E[z] together with its associated covariance matrix z are predicted until observations become available. In the second step, namely the analysis, the model is corrected to better reflect the data through a Bayesian inversion.
To predict the simultaneous evolution of the different magnetic sources with the autoregressive processes presented in the previous section, a matrix F containing all the matrices F i , and a matrix ˜ = ∞ − F ∞ F T characterizing the white noise of the complete evolution model, are constructed. The evolution of the mean model and its covariance from time step k − 1 to step k is then given by the forecast: At iteration k, whenever measurements are available, the model is updated with the formulations: where K k is the Kalman gain matrix and H k is the operator projecting the model to the data d k at iteration k.
Note that the time step of the algorithm has been set to t = 30 min, corresponding to approximately onethird of the different satellites orbital period. Within this time window most of the magnetic sources are assumed to be static. However, the spatio-temporal correlations of the field-aligned current (FAC) source as well as the non dipolar part of the close magnetospheric field are modeled within this time window for reasons detailed in "Parameter estimation" section.

Smoothing
With the Kalman filter algorithm, one gets access to the distribution p(z k |d k ) , where d k corresponds to all the measurements up to iteration k. To obtain p(z k |d) the posterior distribution of the model at iteration k given the entire dataset d , one can apply a smoothing algorithm. In this study, we chose the formulation of Rauch et al. (1965). Starting at the last iteration of the Kalman filter algorithm, the smoothing algorithm performs iteratively backward in time accordingly to the following steps: The combined Kalman filter-smoothing algorithm was also chosen by Ropp et al. (2020) for their IGRF-13 main field candidate. However, their approach differs from ours in many aspects. In particular, their core field evolution is prescribed by an Euler scheme and they estimate the secular variation through the fluctuation of the field within 3-month time windows. Here the secular variation is tied to the core field evolution through the AR2 process. Its evaluation is therefore achieved through dynamical link and correlation with the core field. The 3-month time step of the Kalman filter algorithm chosen by Ropp et al. (2020) also differs from the 30 min time step used here. This difference will have an effect on the modeling of rapidly varying fields, since the time step constrains the temporal window where the various sources are assumed to be static.

Candidate models
The models that we proposed as candidates for the IGRF-13 in 2020.0 are the Kalman filter solutions after the last analysis step in 2019.74 (September the 27th ). This solution was forwarded in time until 2020.0 using the forecast of Eqs. 16 and 17 with propagators F and noise covariance ˜ for a time step of t = 0.26 year . The secular variation candidate is the mean secular variation estimation in 2020.0. The associated uncertainties were obtained by taking the square root of the diagonal elements of the covariance matrix � ∂ t g c (t = 2020) , providing the standard deviation corresponding to each SH coefficients of ∂ t g c .
Our internal field candidate model in 2020.0 contains the sum of the mean core field and the mean lithospheric field at this epoch E[g c ] + E[g l ] . The uncertainties estimates were derived from the covariance matrix g c +g l = g c + g l + g cl + T g cl in 2020.0, where g cl is the cross covariance between the core field and the lithospheric field. The square root of each diagonal elements of g c +g l provides the standard deviation associated with Finally, our candidate for the DGRF 2015.0 model was constructed as our 2020.0 internal field model, except that the core and the lithospheric fields were taken from the smoothing solution.
The Kalmag model presented below uses additional data from September 27 2019 until April 2020.

Parameter estimation
In this section, the parameters characterizing the different magnetic sources, the spectra amplitudes A i and (23) radius a i , the characteristic timescales magnitudes M i and slopes α i are evaluated. Except for the induced/residual ionospheric field and the remote magnetospheric field for which different spherical harmonics truncation levels were tested, the spectral resolution as well as the SH expansion chosen to model the different fields are a priori imposed (see Table 1). The spherical harmonics coefficients of the different sources at the different modeled time are calculated with our Kalman filter scheme with a subsample of the data between 2001.0 and 2018.0. In order to avoid measurements taken by CHAMP or Swarm during strongly magnetically disturbed epochs, and such as no permanent bias due to the static part of the magnetic field generated by the ring current remain in the data, measurements deviating by 60 nT in intensity from the CHAOS-6 internal field model of Finlay et al. (2016) taken up to SH degree 20 and a yearly estimation of a degree 1 external field expressed in the SM coordinate system are removed from the set. After this operation, a sample of N est = 247 453 vector field measurements regularly spaced in time is kept and used to estimate the parameters for the different sources. This estimation procedure is initialized with a first guess for each parameter. For internal (or external) sources, radii lower (or larger) than the Earth's radius are chosen. For the external sources we assume a characteristic timescale of 1 day and set the slopes associated with τ i (ℓ) to α i = 0 . The same is used for the induced and the ionospheric field. The lithospheric field, on the other hand, is assumed to be static. As mentioned above, several authors report that the core field time scales are inversely proportional to the spherical harmonics degree (except for the axial dipole), implying α c = 1 . We start our estimation with an initial guess of α c = 0 and M c = 30 years and check whether we nevertheless recover the results suggested by the other authors.
Given a set of parameters, we perform the Kalman filter assimilation described above with the data subset. Before each analysis step we can calculate how well the model predicts the data with the relation: Summing M pred k over all k iterations provides the measure for the model compatibility with the data. We randomly explore the multi-dimensional parameter space, seeking to maximize L = k M pred k . The algorithm used to perform this operation is sequential. A parameter is chosen randomly, and a Gaussian perturbation is added to it. A Kalman filter simulation is then performed with this new parameter value. If L is larger than at the previous iteration, the new value is assigned to the parameter, otherwise the model returns to its previous state. This step is repeated until the convergence of L is observed. Note that no range was imposed to any of the parameters.
The final values from this parameter search are given in Table 2. Remember that for each source with the exception of the lithospheric field, we distinguish between the dipole spatial and timescales and the spatial and timescales of the other harmonics. The radii presented in Table 2 should not be considered as physical quantities where the source associated with would be lying at. They only enable to discriminate between internal and external sources and influence the shape of the simple degree variance behavior of Eq. 6. Figure 1 shows the static energy spectra projected at the Earth's surface that define the spatial covariance structure of Eq. 6 for the optimal parameters. A comparison with the CHAOS-6.9 core field model of Finlay et al. (2016) (black circles) and the LCS-1 lithospheric field model of Olsen et al. (2017) demonstrates the close agreement. The other internal source taken into account in our model is the residual ionospheric/ induced field ( g ii ). It is dipole dominated and exhibits an almost flat spectrum at the Earth's surface as illustrated by the blue line in Fig. 1. Without the restriction to magnetically quiet data, this source would be much more energetic. This is also the case for the magnetic field generated by field-aligned currents ( g fac ), which reaches a similar amplitude as g ii (dashed line in Fig. 1). The last sources are the external magnetospheric fields, which we model with a close ( g m ), a remote ( g rm ) and a fluctuation components ( g fm ). Together they exhibit a strong dipole and their energy spectra are rapidly decaying.
The different sources cover a large variety of timescales, ranging from minutes to centuries. For the core field, the characteristic time associated with its non dipolar part reads τ c (ℓ) = 514ℓ −1.06 . This power law is close to the slower estimate of Lhuillier et al. (2011) given by τ (ℓ) = 470ℓ −1 , but suggests about 10% longer timescales. For the core dipole, the estimation algorithm yields τ c (1) = 935 years. Since we only consider data over a 17 year period, this estimate is likely not very precise but nevertheless illustrates that the dipole evolves much slower than the other harmonics. The residual ionospheric and induced fields vary very rapidly in comparison, with timescales between 1 h and 1 day, independent of the length scale. Sometimes assumed to be static (see Olsen et al. 2014;Finlay et al. 2016), the remote magnetospheric field has a timescale of τ rm ∼ 10.3 years in our study, a value close to the solar cycle. The part of the external fields typically associated with the ring current are the degree ℓ = 1 contribution of the close and fluctuating magnetospheric fields. Whereas the purely zonal part g m exhibits a characteristic timescale

Table 2 Magnetic sources parameters as described in "Magnetic sources" section
The prior spatial covariance matrices are derived from energy spectra expressed at some radii a i which are are either flat with E ∞ i (ℓ) = A 2 i or of the C-based type with the form E ∞ i (ℓ) = A 2 i (2ℓ + 1)S(ℓ) where S(ℓ) = ℓ + 1 and S(ℓ) = ℓ for, respectively, internal and external sources. The characteristic timescales of Eqs. 5, 7 and 8 are parameterized by τ i (ℓ) = M i ℓ −αi . In the A i column, D corresponds to the magnitude of the dipole component  Table 1. Black and red circles are the energy spectra of, respectively, the CHAOS-6.9 core field model of Finlay et al. (2016) in 2015.0 and the LCS-1 lithospheric field model of Olsen et al. (2017) of τ m (1) = 1.5 days, the fluctuating part g fm , assumed to have SH order m = {0, 1, −1} here, varies faster with τ fm (1) ∼ 8 h. For the small-scale magnetospheric field, the timescales of the zonal contributions are shorter than those of the degree one contributions. Note that for g m , τ m (ℓ ≥ 2) = 18 min, a characteristic time lower than the 30 min time step of the Kalman filter algorithm. In such a case, where the estimation of τ was leading to lower values than the algorithm time step, the slope of the parameterized timescale was set to zero, and the source was only characterized by a spatio-temporal covariance structure of the form given by Eq. 5. Its evolution from one time step to the other was also treated as a temporal white noise. Finally, the fastest varying source is the field-aligned currents with τ fac (ℓ) = 1 min. Together with the non dipolar part of g m , g fac , with its zonal structure and short memory of its past, strongly resembles the observed disturbance along satellite tracks discussed in Finlay et al. (2017), and generally affecting the construction of small-scale lithospheric field models (see Thébault et al. 2017).

Model results
The optimal model parameters described in the previous section are fixed in the sequential Kalman filter assimilation. The model seeks to describe the data with the spherical harmonics source coefficients g i (t) , which we call the Kalmag geomagnetic field model. Because the parameters were derived from data at low geomagnetic activity, we also have to restrict the final model data. This is done on the fly by testing how much a forecast differs from the data. Whenever the difference lies outside the 95.4% confidence interval predicted by the slow varying sources (the ones exhibiting some characteristic time larger or equal than a day at some degree ℓ ) the associated data points are dismissed. All in all, 28.6% of the originally selected data (see "Data" section) were dismissed. Note that this procedure to preselect the data to be assimilated also differs from the reweighted least-square approach chosen by Ropp et al. (2020).
We recall that the forecast time step is set to 30 min. The entire model (mean and covariance of z ) is stored every 0.25 year, although outputs could be saved down to every time steps. Figure 2 shows the Kalmag energy spectra at the Earth's surface for the core and the lithospheric field for the epoch 2015.0. For degree ℓ ≤ 15 , the standard deviations (SD) of both fields, shown with thin black and gray dashed lines, are comparable and exceed the mean value of the lithospheric field. Moreover, the SD of the combined field (thick black dashed line) is smaller than the SD of the individual fields. This illustrates that we cannot separate core and lithospheric contributions at these large scales. It also indicates that the prior level of variance of the lithospheric field, as estimated in the previous section and shown through its associated spectrum with triangles, is simply the extrapolation of the smallscale stationary spectrum towards the larges sales. Spherical harmonics degree Internal field spectra E( Core field Lithospheric field Core+lithospheric fields

Mean
Standard deviation Core field stationary spectrum Lithospheric field stationary spectrum Fig. 2 Energy spectra at the Earth's surface in 2015.0 of the mean (continuous lines) standard deviation (dashed lines) and initial prior standard deviation (symbols) associated with the core field (thin black lines and symbols), the lithospheric field (gray lines and symbols) and the sum of the core and lithospheric fields (thick black lines) Figure 3 compares energy spectra for three types of solutions for the main field (left) and the secular variation (right) in 2015.0 plotted at the Earth's surface. The Kalman filter solution (thin gray lines and symbols), the solution after the smoothing algorithm (thick black lines and symbols), and a third solution for a 5 year forecast from 2010.0 (thin black lines and symbols). Continuous lines show the mean, dashed lines the standard deviation, triangles the differences to the DGRF-13 final field model, and circles the difference to the CHAOS-6.9 SV model.
Not surprisingly, the forecast yields the largest uncertainties. The smallest uncertainties are achieved in the smoothed solution, since the smoothing process allows to take information from the future into account. For the field itself, the fact that the differences to the DGRF-13 final model are similar to the model uncertainties, indicates that these uncertainties are reliably estimated. For the secular variation, the predicted uncertainty levels seems to be slightly overestimated, at least for the Kalman filter and the smoothing solutions. The maximum resolution achieved for the SV is ℓ = 16 for the smoothing solution, beyond this value the SD becomes larger that the mean signal. Figure 4 displays various estimations of the radial (left), azimuthal (middle) and longitudinal (right) secular variation at the level of several ground-based observatories over the period 2000.0-2025.0. Blue dots correspond to SV estimations derived from ground-based observatory measurements. They are obtained by taking annual differences of the measured magnetic field, selected under the same criteria we used for satellite data, and averaged over 0.1 years. The black lines are evaluations of the SV through the CHAOS-6.9 model taken up to SH degree ℓ = 16 . The blue and yellow lines are, respectively, the IGRF-13 secular variation and the Kalmag candidate SV. The red area is the Kalmag mean secular variation ± 2 standard deviation ( σ ). Between 2000.6 and 2020.33 the outcomes of the smoothing solution are shown whereas outside this time window the secular variation is estimated with the forecast step the Kalman filter. Finally, the red dashed lines are the mean SV ± 2σ coming from 5 year forecast simulations.
The first observation one can make is that whenever the secular variation derived from observatory data exhibits a smooth evolution, the latter is well reproduced by the Kalmag model. We can also notice that at least until 2019.0, the CHAOS-6.9 SV is always lying within 95.4% confidence interval ( E[∂ t b c ] ± 2σ ) predicted by our model. The latter interval does not depend on the location where the SV is evaluated. Nonetheless, estimated uncertainties are globally larger for the radial component of the SV than for its azimuthal or longitudinal components. This is certainly due to the fact that the main field is dipole dominated. Because the Kalmag model is only derived from the CHAMP and Swarm measurements, data are missing between 2010.7 and 2013.8. This translates into a global increase of uncertainty predictions as it can clearly be witnessed for the longitudinal component of the SV in Mawson or Kourou. However, with the combination of the Kalman filter with the smoothing algorithm, the data gap does not lead to any particular issue to connect the two satellite eras since such an approach enables us to account for any space-time correlations. As already shown through the energy spectra of Fig. 3, the forecast algorithm is quite accurate to predict the future states of the secular variation. The three hindcast simulations covering the periods 2005-2010, 2010-2015 and 2015-2020 are a confirming it. Nevertheless, in particular locations where the SV exhibits rapid variations as in Honolulu, the simple autoregressive dynamics propagating the core field, fails to not only reproduce but also bound the real evolution of the SV. This calls for using more complex forecast models, able for example to account for the nonlinear interactions between the core field and a time Blue dots correspond to annual differences of ground-based observatory measurements averaged over 0.1 years. Black lines are derived from the CHAOS-6.9 SV model of Finlay et al. (2016). The blue and yellow lines are, respectively, the IGRF-13 and the Kalmag candidate SV. The red area is the Kalmag mean secular variation ± 2 standard deviation. Red dashed lines are the mean SV ± 2 standard deviation coming from 5 year forecast simulations dependent outer core flow as in Barrois et al. (2017), Bärenzung et al. (2018), Sanchez et al. (2019). For the incoming 5 years, both the IGRF-13 or our candidate SV models (which are everywhere quite close to one another) are lying well between the ± 2σ predicted error bars of the updated Kalmag model. The last result analyzed in this study, is a comparison of the different candidates for the IGRF-13 main field in 2020.0, and the field as it can be evaluated with measurements taken after 2020.0. As shown in Fig. 3 and discussed previously, the accuracy of the model derived from the smoothing algorithm, which takes into account knowledge beyond the epoch of evaluation, is higher than the Kalman filter solution where the model derivation only accounts for previously assimilated data. We could also observe that the longer the forecasts the lower the accuracy of the model. For the construction of the IGRF-13 model, measurements were only available up to maximum 2019.75, so the difference between the updated Kalmag model (which derives from data assimilated up to 2020.33) and the various candidates can be considered as the errors of the candidates predictions. These errors are displayed in Fig. 5 through their energy spectra evaluated at the Earth's surface. Whereas the error spectrum of the Kalmag candidate is drawn with a thick black line, the ones associated with the other candidates are shown with thin gray lines. Because the Kalmag model may exhibit a permanent bias, the difference between the model and the candidate may represent an erroneous evaluation of the error. Therefore the errors for each of the candidate models were also computed using another model taking recent data into account (up to March 2020), the CHAOS-7.2 model of Finlay et al. (2020) which is also, in its first version, the parent model of the DTU candidate for IGRF-13. The spectra of these error evaluations are shown with dashed lines in Fig. 5. When compared to the Kalmag model, the Kalmag candidate appears to be the most accurate prediction of the main field in 2020.0 with an error level lower than every other candidate at any SH degree ℓ . When the comparison is performed with the CHAOS-7.2 model, the Kalmag candidate globally remains the most precise estimation of the 2020.0 field up to ℓ = 8 . However, at smaller scales the DTU candidate is closer to its parent model CHAOS-7.2, but the level of approximated error of the Kalmag candidate remains extremely low.

Conclusion
We presented in this study a new approach to derive a geomagnetic field model from direct measurements of the Earth's magnetic field. Performing sequentially in time, the Kalmag model, which is the combination of a Kalman filter and a smoothing algorithm, enables us to consider complex prior covariance structure to characterize both spatially and temporally the different magnetic sources composing the observable field. The evaluation of the parameters controlling the statistical properties of each modeled source reveals the large variety of spatial and timescales populating the Earth's magnetic field, and reinforces the idea of treating the assimilation of geomagnetic data sequentially in time. By allowing the presence of a large-scale lithospheric field independent from the core field, we could show that with the prior characterization we chose, the two sources could not be separated. Furthermore, although the sum of the two fields can be very accurately estimated, the level of uncertainty associated with each individual source is directly linked to the prior variance of the lithospheric field. This implies a maximum resolution for the core field of spherical harmonics degree ℓ = ∼ 15 . Its time derivative however, can be accurately estimated up to ℓ = 16 . Globally, the model provides reliable uncertainty quantification for whether past, present or future field estimates. It also permits, through the spatio-temporal correlations a priori imposed, to consistently connect the CHAMP and the Swarm satellite eras.
For short-term forecasts, as the derivation of the IGRF model requires it, we could observe that our approach can be more accurate than other existing methods. This is certainly due to the fact that the secular variation is estimated through its dynamical correlation with the core field and is not a fit to the past evolution. There is nevertheless still some room for improvement. Considering more physically based dynamical equations to constrain the evolution of the various fields, such as dynamo simulations for the core field, would certainly improve the separation of the different sources, and provide more accurate predictions of future states. At high latitudes, a better treatment of the field generated by the auroral electrojet would certainly permit to reduce the amount of rejected data in the preselection phase of the Kalman filter algorithm. The temporal window covered by the model could also be extended by taking data from not only previous satellite missions, but also ground-based observatories or magnetic surveys.