Physics-based secular variation candidate models for the IGRF

Each International Geomagnetic Reference Field (IGRF) model released under the auspices of the International Association of Geomagnetism and Aeronomy comprises a secular variation component that describes the evolution of the main magnetic field anticipated for the 5 years to come. Every Gauss coefficient, up to spherical harmonic degree and order 8, is assumed to undergo its own independent linear evolution. With a mathematical model of the core magnetic field and its time rate of change constructed from geomagnetic observations at hand, a standard prediction of the secular variation (SV) consists of taking the time rate of change of each Gauss coefficient at the final time of analysis as the predicted rate of change. The last three generations of the IGRF have additionally witnessed a growing number of candidate SV models relying upon physics-based forecasts. This surge is motivated by satellite data that now span more than two decades and by the concurrent progress in the numerical modelling of Earth’s core dynamics. Satellite data reveal rapid (interannual) geomagnetic features whose imprint can be detrimental to the quality of the IGRF prediction. This calls for forecasting frameworks able to incorporate at least part of the processes responsible for short-term geomagnetic variations. In this letter, we perform a retrospective analysis of the performance of past IGRF SV models and candidates over the past 35 years; we emphasize that over the satellite era, the quality of the 5-year forecasts worsens at times of rapid geomagnetic changes. After the definition of the time scales that are relevant for the IGRF prediction exercise, we cover the strategies followed by past physics-based candidates, which we categorize into a “‘core–surface flow” family and a “dynamo” family, noting that both strategies resort to “input” models of the main field and its secular variation constructed from observations. We next review practical lessons learned from our previous attempts. Finally, we discuss possible improvements on the current state of affairs in two directions: the feasibility of incorporating rapid physical processes into the analysis on the one hand, and the accuracy and quantification of the uncertainty impacting input models on the other hand.


Introduction and background
This solicited letter deals with the secular variation component of the International Geomagnetic Reference Field (IGRF), which is used to predict the evolution of the main geomagnetic field during the 5 years that follow each quinquennial release of a new generation of the IGRF (consult Alken et al. 2021a, with regard to the latest release).
The magnetic field at location r and epoch t 1 , B(r, t 1 ) , is related to its state at the same location and at an anchor epoch t 0 by where the time rate of change ∂ t B is traditionally referred to as the secular variation (SV). If this field is due to dynamo action alone, the SV in the dynamo region (1) B(r, t 1 ) = B(r, t 0 ) + t 1 t 0 ∂ t B(r, t) dt, Open Access *Correspondence: fournier@ipgp.fr 1 Université de Paris, Institut de Physique du Globe de Paris, CNRS, Paris, France Full list of author information is available at the end of the article (Earth's fluid outer core) is governed by the induction equation where u is the fluid flow and η is the magnetic diffusivity of the core.
For practical purposes connected with utilizing the Earth's magnetic field, we place ourselves outside the Earth's core, in a source-free region. Under the assumption that the mantle is an electrical insulator, changes of any of the three components of the magnetic field at a location r are linearly related to changes in the radial component of the field at the top of the core, that is the sphere of radius c = 3485 km. For instance, the change in the northward component X reads where the description of the kernel K X , which requires an integral to be performed over the surface of the core, can be found in, e.g. Gubbins and Roberts (1983), and where r c denotes any location at the core surface. Considering the three components together, the rate of change of the field can be expressed using a vector kernel K B , such that Thus, in principle, the knowledge of ∂ t B r everywhere at the core surface suffices to prescribe the rate of change of the magnetic field at any location between the core surface and the top of the source-free region that includes Earth's surface. This yields an updated form for Eq. (1), The radial projection of the induction equation Eq. (2) at the core surface (where the radial flow is zero) allows one to write where ∇ H · , ∇ 2 H and c are the horizontal divergence operator on the unit-sphere, horizontal Laplacian operator on the unit-sphere, and the radius of the core, respectively. An exact physics-based estimate of ∂ t B r requires knowledge of the three terms of the right-hand side of Eq. (6), namely from left to right: lateral transport by core surface motion, radial diffusion, and horizontal diffusion. An (5) B(r, t 1 ) = B(r, t 0 ) + approximate estimate can rely on the knowledge of only one term, provided it is the dominant one. The material that we covered so far does not correspond to the traditional mathematical formulation under which the SV problem is cast for the IGRF exercise. Indeed, being in a source-free region makes it possible to describe B using a magnetic potential, which can in turn be specified mathematically using Gauss coefficients (e.g. Sabaka et al. 2010). The secular variation is accordingly characterized by the time derivative of those Gauss coefficients. More precisely, within the current IGRF framework, each Gauss coefficient up to spherical harmonic (SH) degree 8 is assumed to undergo its own independent linear evolution over the t = 5 years of interest after the release of the IGRF. The rates of change of Gauss coefficients up to SH degree 8 define the IGRF SV model. This amounts to approximating Eq. (1) according to The quantity of interest in this letter is therefore the average �Ḃ(r)� , and how physical models connected with Eq. (6) can be used to compute it.
Before dealing with this quantity in more detail, it is of interest to appreciate the impact of this first IGRF approximation (that of a linear evolution) on the forecast error during the 5 years of operation of the IGRF SV model. To this end, let us consider four starting epochs t 0 (2000.0 AD, 2005.0 AD, 2010.0 AD, 2015.0 AD). We define the time-dependent forecast error ǫ(t 1 ) as the root mean square (rms) difference at Earth's surface (the sphere of radius 6371.2 km) between the "true" main field (in a sense to be defined below) at year t 1 and its estimate based on the linear extrapolation of its true value at t = t 0 from t 0 to t 1 (Eq. 7), considering two estimates for �Ḃ(r)�: • the SV at t 0 , �Ḃ(r)� = dB/dt(r, t = t 0 ) ; this forecast is based on the value of the derivative of the Gauss coefficients at the starting epoch t = t 0 , which is the typical strategy followed by mathematical extrapolations, • the average SV between t 0 and t 0 + t , �Ḃ(r)� = [B(r, t + �t) − B(r, t)]/�t , that will result in an exact prediction after t = 5 years; this would correspond to the end goal of physics-based forecasts.
We use the 7.5 version of the time-dependent, continuous CHAOS model by Finlay et al. (2020) for this illustrative exercise, assuming that it represents the true geomagnetic field. Figure 1 shows how the error evolves over the 20 years of interest. The first option leads to a sawtooth pattern, with an error that reaches values in the range 60 − 80 nT.
The second option leads to a bumpy pattern, since by construction the forecast at t 0 + t is the "true" field. Under these exceptionally favourable circumstances, the error peaks at about 20 nT at t ≈ t 0 + 2-2.5 years. In summary, using this metric for the forecast error, this exercise shows that by design, the IGRF will induce rms errors ranging from at least 20 nT to 80 nT over its 5 years of validity. We note that 80 nT is in fact a lower upper bound, since it was obtained by the comparison of the CHAOS model with itself: the real error will be larger since the difference with the true geomagnetic field will not be zero at t = t 0 .
In any event, the goal of the IGRF physics-based forecasts discussed in this letter is to provide a �Ḃ(r)� as accurate as possible, that would ideally correspond to the second option discussed in our preliminary exercise. Such a forecasting strategy can be summarized as follows: (8) For any epoch t 1 of the IGRF released at t 0 , the field is predicted to be B(r, t 1 ) = B(r, t 0 ) + (t 1 − t 0 )�Ḃ(r)�, with Eq. (9) is an advection-diffusion equation over the core surface augmented with the contribution of the radial diffusion of the field. Its direct numerical integration faces several obstacles, connected with the fact that neither the flow u(r c , t) nor the radial diffusion (η/c 2 ) ∂ 2 r (r 2 B r ) (r c , t) can be estimated directly from observation. With regard to the two terms on the right-hand side of Eq. (9) that reflect diffusion processes, and for the large scales of interest here (SH degree ≤ 8 ), we may anticipate that these terms will not induce rapid (intra-annual to interannual) changes over the t = 5 years of interest, unless hypothetical strong radial gradients come into play. In contrast, the transport term may have such an effect, provided there exists physical processes in the core with intra-annual to inter-annual time scales and a noticeable geomagnetic imprint at the core-mantle boundary.

Fig. 1
Forecast error measuring the difference between the true field at epoch t and a linear extrapolation between t 0 and t, with t 0 = t/5 (integer division). Each Gauss coefficient has a constant rate of change, which is set either to the value of its time derivative at t = t 0 (blue curve) or to the average rate of change between t 0 and t 0 + 5 years (orange curve). Fields are expanded up to spherical harmonic degree and order 8 Fournier et al. Earth, Planets and Space (2021) 73:190 Before discussing the various strategies explored in the past to follow a physics-based approach, we would like to perform a retrospective analysis of past IGRV SV models in order to sharpen our understanding of the error budget at stake for the IGRF-SV exercise.

A retrospective analysis of past IGRF SV models
For an IGRF released in year Y, we define the intensity SV error as the rms difference at Earth's surface between the main field vector at year Y + 5 and the 5-year linear extrapolation of the main field vector at year Y using the SV model at year Y. In addition to the intensity error, we consider an alternative measure of error, which we call the direction error, that reflects the local angle at Earth's surface between the horizontal component of the main magnetic field at year Y + 5 and the 5-year linear extrapolation of the main field at year Y using the SV model at year Y. We found that the properties of the direction error essentially reflect those of the intensity error; consequently, we provide the interested reader with more information on the direction error and its fluctuations over the past 35 years in Appendix A and focus here on the intensity (vector) error.
All models are truncated at spherical harmonic degree 8, the truncation of the SV constituent of each IGRF release we considered. The main field models are the definitive geomagnetic reference field models of the year Y (except for year 2020 for which we resort to the IGRF, since the DGRF constituent will appear in the 14th release of the IGRF). We used the coefficients available from www. ngdc. noaa. gov/ IAGA/ vmod/ igrf_ old_ models. html. For IGRF-11 and IGRF-12, we also plot the errors that can be attributed to individual candidate models that were submitted in response to the corresponding call made by the IAGA task force in charge at the time. Fig. 2 a Intensity secular variation error (see text for its definition) for past generation IGRF SV models (in blue) and those SV candidates submitted for IGRF-11 and IGRF-12 (in grey), computed using fields expanded up to spherical harmonic degree and order 8. b Energy of the second derivative of the magnetic field, the secular acceleration, at Earth's surface, up to spherical harmonic degree and order 8, according to the MCM model by Ropp et al. (2020) and the CHAOS model by Finlay et al. (2020), version 7.5 Figure 2a shows the evolution of this error with respect to time since 1985. To ensure the consistency of this measure of error, we ignored IGRF-9, which was released in 2003 as an update to IGRF-8 (Macmillan et al. 2003), and provided de facto an extremely good guess of the secular variation between 2000 and 2005, the interval of interest for IGRF-8. Also, note that IGRF-6 was released in 1991 (Langel 1992), which explains why its SV error is markedly lower, by more than 20 nT, than that of its neighbours.
Upon inspection of Fig. 2a, a few comments are in order: (1) The error varies between 70 and 140 nT.
(2) Errors are larger prior to the advent of the almost continuous era of low-Earth orbiting magnetic satellites which started with Oersted in 1999 (keeping in mind that IGRF-6 was released in 1991). The observation of Earth's magnetic field using satellites has resulted in a 30-nT drop in the error after 5 years.
(3) The IGRF SV model built for IGRF-12 performs better than all candidates considered separately. (4) Over the last three cycles of 5 years, the error has been the lowest when considering the IGRF-11 SV model (years of operation: 2010-2015).
This last point can tentatively be connected with the IGRF-SV model inability to reflect fast (i.e. intra-to interannual) changes occurring in the dynamo-generated field.
In an attempt to substantiate this statement, we show in Fig. 2b the energy of the second time derivative of the magnetic field, termed the secular acceleration, at Earth's surface, up to spherical harmonic degree and order 8, according to the time-dependent models MCM and CHAOS-7.5, by Ropp et al. (2020) and Finlay et al. (2020), respectively. The secular acceleration at the surface of the core obeys the time-derivative of Eq. 9, namely where we understand that each term depends on r c and t. Notwithstanding the difficulties connected with the estimation of the secular acceleration, Fig. 2b suggests that the lowest error was obtained over the time interval during which the acceleration was minimum. In other words, beyond the error by design discussed above, the linear assumption made for the IGRF was less justified at times (10) of larger acceleration. We refer the reader interested in the observed signature of this acceleration to the study by Chulliat and Maus (2014) for the 2000-2010 time frame, and to the recent investigation by Finlay et al. (2020) of the 2000-2020 time span. Interestingly, a number of SV candidates to IGRF-11 , IGRF-12 , and IGRF-13 (Alken et al. 2021a) included a physics-based component, and therefore possibly some flavour of the secular acceleration, in the design of their �Ḃ� . We will review and discuss those contributions in the next section.

Physics-based estimates of �Ḃ�
A physics-based estimate of �Ḃ� relies on the solution to the initial value problem of Eq. (9), subject to some approximation. All approximations used so far in the IGRF context rely on a deterministic formalism. Key to the success of a deterministic approach is its ability to capture the time scales that make the secular variation and its rate of change, the secular acceleration.

Time scales of the secular variation and secular acceleration
The secular variation time scale at spherical harmonic degree ℓ , τ sv (ℓ) , is connected with the induction equation Eq. (9) above. It represents the time it takes for magnetic energy at spherical harmonic degree ℓ to be completely renewed (Hulot and Le Mouël 1994). It reads where W ℓ and Ẇ ℓ are the energy contained in the ℓ th spherical harmonic degree of the magnetic field and its rate of change, respectively. Note that taking the ratio of these two effectively suppresses the dependency of the time scale to the radius of the spherical surface in the source-free region where the analysis is carried out.
Likewise, the secular acceleration time scale at spherical harmonic degree ℓ , τ sa (ℓ) , is connected with the secular acceleration equation Eq. (10) and reads where Ẅ ℓ is the energy contained in the ℓ th SH degree of the secular acceleration. The secular acceleration time scale does not depend on the radius of analysis either.
Studies suggest that for ℓ > 1 (the non-dipole field), τ sv (ℓ) can be described by an inverse linear law of the form with a master coefficient τ sv on the order of 400-500 years . This points to the fact that at those scales, it is the diffusionless, or frozen-flux, form of Eq. (9) that prevails. For the IGRF exercise, this law implies that the highest degrees of B of interest ( ℓ ≤ 13 ) have a time scale of few decades, meaning that they will not undergo dramatic changes over the t = 5 years forecast. By inspecting recent changes in the geomagnetic field over the 2000-2020 time window, and according to CHAOS-7.5 (Finlay et al. 2020), one can indeed notice that the relative change in the magnetic energy at Earth's surface over periods of 5 years is on the order of a few tenths of a per cent, ∼ 0.2 − 0.4 %. Interestingly, the law given by Eq. (13) also applies to the magnetic fields produced by numerical dynamo models Christensen et al. 2012;Bouligand et al. 2016), and is commonly used to rescale the time axis of such simulations, a procedure which proves to be even more useful if these simulations are used to produce a forecast (e.g. Minami et al. 2020;Fournier et al. 2021, in the context of IGRF-13).
Regarding the secular acceleration time scales, Christensen et al. (2012) demonstrated that at large scale ( ℓ 10 ), τ sa (ℓ) is nearly constant in numerical dynamo models, equal to τ sa . At small scale ( ℓ ≫ 1 ), it follows an inverse linear law. In a frozen-flux scenario (Roberts and Scott 1965), the former behaviour points to a dominant B r ∂ t u term in the right-hand side of Eq. 10 at large scale, and the latter to a dominant u∂ t B r term at small scale. Satellite-based observation of the field since 1999 has allowed the largest scales ( ℓ 8 ) of the secular acceleration to be estimated; the associated time scales are also compatible with a constant value, τ sa ∼ 10 years Christensen et al. 2012), although slightly smaller values are obtained for the most recent magnetic field models: we find for instance that τ sa = [11.1, 14.7, 7.2, 10.2, 6.1, 8.0, 7.5, 10.1] years for the first 8 spherical harmonic degrees according to the CHAOS-7.5 model (Finlay et al. 2020), considering the average of Ẇ ℓ and Ẅ ℓ over the 1998.0-2020.0 time window. These values stress the benefit one can get from an accurate description of ∂ t u for a short-term forecast, such as the one required by the IGRF-SV. In a self-consistent dynamical setting, we stress that an accurate description of the flow acceleration is contingent upon an accurate description of the force balance at work (for an enlightening primer on yearly to secular core dynamics, consult .
As an aside, we note that the time scales of the secular variation and secular acceleration can be automatically accounted for within a stochastic formalism that may ignore Eq. 9 to produce a forecast, by resorting instead to a "calibrated" stochastic equation to advance B in time, in the form of a collection of auto-regressive processes whose characteristic time scales are precisely the time scales discussed here. (In the IGRF context, see the candidates proposed by Gillet et al. 2015;Baerenzung et al. 2020;Huder et al. 2020) Physics-based contributions to IGRF-11, IGRF-12, IGRF-13 The number of SV candidates incorporating at least one component of the right-hand side of Eq. 9 in their prediction increased from 1 out of 8 for IGRF-11 ) to 3 out of 9 for IGRF-12 , and finally 6 out of 14 for IGRF-13 (Alken et al. 2021a), for a grand total of 10 candidates.

Observations used
In what follows, observations refer to mathematical models of B and Ḃ , ingested often (but not always) together with their uncertainties, by the physical models to be described. These mathematical models, which come in the form of a set of Gauss coefficients, were either specifically designed for the purpose of the IGRF, or already available in the literature. Most are subject to some a priori (or regularization) constraints (see e.g. Gillet 2019 for a recent review on core field models, and the physics they incorporate).

Physical models
Physics-based candidates can be divided into two main categories, depending on the physical core used to time step Eq. 9: • Candidates that rely on a core-surface flow engine, whereby the knowledge of u at the core surface is used together with the knowledge of B r to advect field lines and advance in time. The 3 candidates we identified are those of Hamilton et al. (2015); Metman et al. (2020); Brown et al. (2021). The first attempt to use that engine in proof-of-concept experiments was made by Maus et al. (2008), which received some subsequent criticism (Lesur and Wardinski 2009;Maus et al. 2009). A solid body of work nevertheless ensued, based on a steadily improving framework (Beggan and Whaler 2009;Beggan and Whaler 2010;Whaler and Beggan 2015;Beggan and Whaler 2018). The two-dimensional flow is a solution to the core-flow problem which, given estimates of B and Ḃ (and their uncertainties), provides the practitioner with a large-scale frozenflux estimate of u(t 0 ) at the core surface (Holme 2015). Additional secular acceleration data is included for some, but not all, candidates, in order to estimate ∂ t u(t 0 ) (Whaler and Beggan 2015), thereby allowing the flow to vary according to over the time interval of interest for the forecast. The frozen-flux approximation is commonly used to advance in time, but note that the candidate by Metman et al. (2020) interestingly resorts to a hybrid, two-step, approach that incorporates estimates of radial diffusion in a second stage to improve on their forecast. • Candidates that rely on a dynamo engine, where the integration in time is performed by means of a three-dimensional, self-consistent, model of the geodynamo. We counted 7 candidates (consult the 6 papers by Kuang et al. 2010;Fournier et al. 2015;Minami et al. 2020;Sanchez et al. 2020;Fournier et al. 2021;Tangborn et al. 2021-there was no companion paper associated with the 2015 NASA-Goddard candidate). All operated with a dynamo driven by convection. A three-dimensional initial condition at t = t 0 is required to start the integration of the model forward in time. The definition of the initial condition varies among candidates: it results either from the sequential or variational assimilation of observations, or an instantaneous estimate of the core state based on observations. Note in passing that since dynamos are deterministic chaotic systems, a prerequisite for their usage is that their horizon of predictability exceeds the t = 5 year current forecast of the IGRF. The studies by Hulot et al. (2010);  suggest a horizon of several decades. The basis of such analysis is the determination of the e-folding time τ e , that characterizes the exponential error growth caused by an imperfect, even if infinitesimal, knowledge of the state of the system. During the exponential growth phase, the error reads To determine τ e in practice, a possibility consists of analysing the evolution of an ensemble of simulations with close, but not equal, initial conditions, and to consider that the evolution of the scatter of the ensemble is a proxy for the evolution of the error. Minami et al. (2020) computed an e-folding time of 140 years for their numerical dynamo model, which confirms that chaotic error growth will have a negligible effect on the quality of the forecast after t = 5 years. Figure 3 further illustrates chaotic error growth for the coupled Earth dynamo model ) used by Fournier et al. (2015Fournier et al. ( , 2021. Figure 3c shows that the standard e-folding time is found to be close to 25 yr. This e-folding time is based on a threedimensional measure of the divergence of N e = 100 dynamo trajectories, that includes all components of the convection-driven dynamo state (flow field, magnetic field, and mass anomaly field). In mathematical terms, we have defined the root mean square scatter S 3D as where x i is the non-dimensional, complex-valued dynamo state column vector of the ith ensemble member, the overbar symbol denotes ensemble average, and the dagger symbol means conjugation and transposition (see Fournier et al. 2013, for details).
To conform with the IGRF framework, we also show in Fig. 3d the divergence of the simulated magnetic fields at Earth's surface. The corresponding definition of S a is where g i is the column vector of Gauss coefficients truncated at degree 8 of the ith ensemble member, D is a diagonal matrix with entry ℓ + 1 for coefficients of SH degree ℓ and T means transposition. The value of the e-folding time that is found (31 yr) is close to the standard, three-dimensional one. This implies, again, that the intrinsic chaotic character of dynamo simulations will not be detrimental to an IGRF-type forecast. Regardless of the physical engine chosen, the workflow that is used can be schematically described by the flowchart given in Fig. 4.
The two families of physical models (based on core surface flows or on dynamos) are themselves uncertain; in an attempt to take, at least partly, this uncertainty into account, most groups now resort to ensemble approaches (Minami et al. 2020;Sanchez et al. 2020;Brown et al. 2021;Fournier et al. 2021;Tangborn et al. 2021) for steps (2) and (3) in Fig. 4. What are the advantages and drawbacks of each family? The 2D approach has the advantage of its lightweight and is presumably easier to implement and operate than the 3D one. Conversely, the 3D approach naturally incorporates the effects of magnetic diffusion in a self-consistent manner, provided that the magnetic Reynolds number, defined as the ratio of the magnetic diffusive time scale to the convective transport time scale, has an Earth-like value of O(1000) . A quantitative assessment of the merits of each family calls for a community benchmark to be performed. Performing such an assessment is beyond the scope of the present paper. Here we provide the reader with the knowledge we gained while preparing SV candidate models for the IGRF.  Aubert et al. (2013). 100 dynamo models are integrated forward in time, starting from close initial conditions. Each initial condition is defined by the same, well-equilibrated dynamo state whose axial dipole is slightly perturbed by a normal perturbation of relative amplitude 10 −8 . a bulk magnetic energy versus time, normalized by its value at t = 0 , for the 100 members of the ensemble. b latitude of geomagnetic North at Earth's surface, for the 100 members of the ensemble. c 3D scatter S 3D versus time. τ e is the e-folding time. d scatter of simulated magnetic fields S a at Earth's surface. Ensemble integration and on-the-fly diagnostics obtained using the parody_ pdaf implementation . See text for details Fournier et al. Earth, Planets and Space (2021) 73:190 Lessons learned from IGRF-12 and IGRF-13 We now discuss some lessons we learned during the design of our candidate SV models for 2015 and 2020, which belong to the dynamo family. Those lessons were based on a large number of forecast in the past (hindcast) experiments used to consolidate and sharpen our workflow. The first two lessons are discussed in the papers describing the candidates (Fournier et al. , 2021, and we summarize them here. The third lesson is new and illustrated with some novel data.

Fig. 4
Flowchart summarizing the steps currently involved in the making of a physics-based secular variation candidate model. X(t), Y(t), Z(t) are the three components of the magnetic field vector recorded in an observatory or by a satellite. Field modelling refers to the process that converts databases of these time series into global mathematical models of the main magnetic field B(t) and its rate of change Ḃ (t) . These mathematical models are fed, together with their uncertainty, to a workflow that initializes the state vector (or ensemble of state vectors) for subsequent integration by the numerical model. The average secular variation between the start date of the IGRF ( t 0 ) and its end ( t 0 + t = 5 yr) is next readily evaluated. If an ensemble approach is used, an ensemble of such average secular variations are obtained, the statistics of which can be used for uncertainty quantification (1) Uncertainty quantification and error budget. There must exist a compatibility between the observations (in the sense defined above) and the physics of the dynamical model. The SV spontaneously exhibited by the dynamo model must adequately capture the geomagnetic SV, otherwise any forecast is bound to fail. A way to broaden the region of overlap between the observed and simulated SV is to adopt an ensemble approach for the assimilation of observation. Also, it is possible to build the mathematical model of B and Ḃ using the statistics of the dynamo model as prior information (see Ropp et al. 2020;Fournier et al. 2021, for details). In addition, of importance on the observation side is the underestimated uncertainty impacting the mathematical model of Ḃ , on the account of unmodelled external sources on the one hand (e.g. Thébault et al. 2012) and an inadequate representation of induced fields generated in the crust and mantle on the other hand (Olsen et al. 2005). These two latter contributions are poorly known and modelled, and therefore can easily be mistaken for core field fast variations. The unrealistic confidence in the input SV model can impose extremely tight and unnecessary constraints on the dynamo model. (2) Beware of unwanted flow acceleration. Above we stressed the potential benefit for a short-term forecast one could gain from an accurate description of ∂ t u . Note, however, that the definition of the initial condition (or ensemble of initial conditions) rests on a statistical operation that does not guarantee dynamical equilibrium. In our previous candidate models (Fournier et al. , 2021, the initialization resulted in unwanted flow acceleration at large scale, a consequence of the deviation of the statistically estimated state from a state that would respect the force balance in the system. Since this forecast is over 5 years, that is below the large-scale secular acceleration time scale τ sa , we conservatively decided to keep the flow steady to integrate the model forward in time, encouraged to do this by the results of hindcast experiments. To advance beyond this point, i.e. incorporate realistic flow acceleration we therefore need to describe the correct force balance in the core and enforce ways to respect it. Recent progress in theoretical and numerical dynamo modelling (e.g. Davidson 2013;Aubert et al. 2017;Schwaiger et al. 2019;Aubert and Gillet 2021) has shown that at leading order, this balance is most likely one between the Coriolis, buoyancy, Lorentz and pressure forces (the so-called QG-MAC balance). Second-order deviations from this balance are caught by inertia, which comes several orders of magnitude below these forces. The response of the system to QG-MAC imbalances (Aubert and Gillet 2021) therefore takes the form of either magneto-inertial waves (or Alfvén waves) or Coriolis-inertial waves (Rossby waves). At Earth's core conditions, Alfvén waves are favoured in the magnetic signal and thought to be responsible for the signal shown in Fig. 2b (Aubert and Finlay 2019). Enforcing a QG-MAC dynamical balance at the top of the core (Aubert 2020) helps to get a better forecast (see the results shown in Table 1 below), as it provides a base state that equilibrates the QG-MAC balance well and hence minimises the influence of spurious waves. But it does not prove to be sufficient yet, because at the same time it throws away that part of the Alfvén wave dynamics which would be desirable to truly describe the evolution of the system and improve the forecast.
(3) Impact of the SV input model. Our two candidates were based on a year of Swarm data each to compute the main and SV input field models. This probably does not suffice to produce high-fidelity estimates of the highest spherical harmonic degrees of the SV. Practice has shown us since that several years of data are necessary to stabilize these scales. In order to illustrate this stabilizing effect, we have computed a sequence of main field and secular variation models for every year between 2001.0 and 2019.0, using all the available satellite data between 2000.5 and 2019.5, by means of the sequential approach of Ropp et al. (2020), with Kalman filter analysis steps every 12 months. The SV is assumed constant over each time window. The smoothing step described by Eqs. (16-18) of Ropp et al. (2020) is not performed here, to avoid propagation of information from the future to the past. Compared with the single year of data strategy that we followed for our IGRF SV candidates, a sequence of Kalman filter analyses is expected to result in more accurate estimates of the main field and its SV at a given year, as they capitalize on the knowledge acquired during the previous years. Figure 5 shows the Lowes-Mauersberger spectra of a posteriori variances of the sequence of SV models between 2001 and 2010. Regardless of the absolute value of the variances (recall lesson 1 above), it is clear from this figure that 2 years of data appear sufficient to stabilize the largest scales of the secular variation, and that 5 years of almost continuous satellite data are required to obtain converged error estimates for SH degree 10 and above. Table 1 is an extension  of Table 2 by Fournier et al. (2021), which features the results of additional hindcast experiments performed with the input magnetic field and SV models constructed following the sequential Kalman filtering approach we just described (that corresponds to step 1 in Fig. 4), in conjunction with the steady flow assumption and the mild imposition of the QG-MAC balance for the inverse geodynamo modelling framework (steps 2 and 3 in Fig. 4), as done to design our SV candidate for IGRF-13. Inspection of the numbers in Table 1  In any event, we note that the overall spread of all forecasting strategies is a few nT rms and that there remains room for improvement in the implementation of physics-based forecasts, that are not yet systematically more accurate than linear extrapolation strategies.

Discussion
The performances of the 9 SV candidates for IGRF-12 were assessed retrospectively, over the 2015-2020 time frame in the IGRF-13 evaluation companion study by Alken et al. (2021b). The SV was analysed at 42 ground observatories and at the global scale. The observatory analysis relied on the cumulative misfit for the three components of the SV measured at the observatories obtained by each candidate. Results show that the best performance was obtained by physics-based candidates, IPGP's candidate for the X and Z components, and the BGS candidate by Hamilton et al. (2015) for the Y component. The margin with respect to the other candidates is admittedly slim (Alken et al. 2021b), but this nevertheless sends an encouraging signal towards approaches that incorporate at least a pinch of physics in their workflow. But before congratulating ourselves further, let us first recall the mixed conclusions drawn from the hindcasting experiments described above, and second note that the global analysis reveals that all forecasts missed the geomagnetic jerk that started early in 2014, following a pulse of secular acceleration that had peaked at the core surface in 2012-2013 (Torta et al. 2015;Finlay et al. 2016;Kotzé 2017;Soloviev et al. 2017). The findings of Alken et al. (2021b) were corroborated by Brown et al. (2021) who performed a similar analysis of the same period; their conclusion was that physical predictions did not do better or worse than mathematical extrapolations over the 5-year time span of the IGRF-12 release. How can we improve on this state of affairs? We discuss the possibilities connected with two open questions.

Can we incorporate the physics of secular acceleration pulses?
Numerical geodynamo models operating close to Earth's core conditions as regards their rapid dynamics have also linked the hierarchy of force balances in the core with a hierarchy of dynamical time scales (Aubert et al. 2017;Aubert 2018;Aubert and Gillet 2021). The leading-order QG-MAC balance was found to be responsible for the slow, inertialess convective evolution of the dynamo over the typical core overturn time scale τ U = D/U ≈ 140 years (where D is the thickness of the core and U the typical core flow velocity). Inertial deviations to the QG-MAC balance take the form of Alfvén waves with natural time scale τ A = √ ρµD/B ≈ 2 years (Gillet et al. 2010, where ρ, µ are the fluid density and magnetic permeability and B is the typical magnetic field amplitude in the core). As the level of inertia in Earth's core is expected to lie 4 to 5 orders of magnitude below the QG-MAC forces (Schwaiger et al. 2019;Aubert and Gillet 2021), the amplitude of waves is expected to remain small relative to the convective flows, with the ratio between the two being on the order of the Alfvén number A = τ A /τ U ≈ 10 −2 (Aubert and Gillet 2021), such that leaving out the description Table 1 Forecast error (in nT) over recent periods for different forecasting strategies, and using two classes of input models. The forecast error is expressed as the root-mean-squared difference between the true geomagnetic field vector, defined by an update of the CHAOS-6 field model (Finlay et al. 2016), and the forecast at the terminal epoch. The forecast error is computed for a spherical harmonic truncation equal to either 8 or 13, to conform with the IGRF framework on the one hand and to enable comparison with our previously published results on the other hand. Nocast assumes that the field does not change. Linear extrapolation assumes a linear variation whose slope is specified by the secular variation up to degree and order 8 / 13 at the start of the forecast period, as specified by the input model fed to the inverse geodynamo modelling framework: "1-year SV" means that the input model is determined based on one year of satellite data prior to the forecast; "Seq. SV" implies that the full set of satellite data available before the beginning of the forecast is used to compute the input model, using a sequential Kalman filter with analysis steps every 12 months. "Coupled Earth, ensemble, QG-MAC" indicates that the full coupled Earth dynamo model is integrated for an ensemble comprising 100 members, with the median defining the forecast, and an initialization where the QG-MAC constraint is mildly enforced for the flow at the core surface. In the bottom five rows, "steady flow" implies that the sole three-dimensional induction equation (with magnetic diffusion) is integrated, with or without an ensemble approach, and with or without the mild imposition of the QG-MAC constraint. If an ensemble approach is adopted, it is the median that defines the forecast. All non-"Seq. SV" entries for SH truncation at degree 13 taken from  of these waves initially appears as a reasonable option for geomagnetic prediction. This is of course absolutely misleading, because as the waves evolve on a characteristic time scale A −1 ≈ 100 times shorter than convection, they cause magnetic acceleration signals that are in fact on par with those created by convective acceleration. Even more importantly, our current understanding of magnetic acceleration pulses and geomagnetic jerks also links with additional focusing effects operating on these waves at the core-mantle boundary (Aubert and Finlay 2019).
In the perspective of improving physics-based predictions of the short-term geomagnetic evolution, as carried out by the IGRF, it is therefore essential to gain some predictive power for these rapid, inertia-bearing waves, as they likely cause most of the magnetic acceleration signals currently missed by all candidates. The main stumbling block is obviously the tiny nominal amplitude of the waves, which should be compared to the current uncertainties in the determination of the core state by data assimilation. To achieve a better description of the wave/convection interplay in Earth's core, our medium-term efforts should aim at developing data assimilation algorithms able to progressively cope with increased levels of enforcement of the QG-MAC balance as the schemes advance towards the present and the input geomagnetic data reach higher accuracy levels, in step (2) of the schematic workflow shown in Fig. 4. Even with this done, it is likely that the structure of magnetohydrodynamic waves present in Earth's core together with convective flows could remain beyond the reach of our estimation capabilities, in which case we might have to resort to probabilistic methods to predict the effects of these waves. This discussion also underlines the crucial need, detailed in the next item below, for high-quality geomagnetic data and high-quality uncertainty estimation.

Can we decrease the error of our input SV models and improve its estimates?
The quality of the magnetic field forecast depends on the input SV model. However, depending on the strategy used for the forecast, and its objectives, requirements on the input SV model may differ-e.g. if the strategy is a linear extrapolation an input SV averaged over several years may ultimately provide the best forecast. However, in the framework of physics-based strategy, it can be expected that an input snapshot SV model with high temporal resolution and small variances, over a large range of spatial scales would give the best output. Unfortunately, geomagnetic data do not yet allow to derive such a model because of the number and complexity of sources that generate the recorded signal.
A first challenge is the separation of internal and external sources at the sought increasing spatial and temporal resolution. Global SH models are derived from groundbased magnetic observatory and satellite measurements. Magnetic observatories monitor accurately the total magnetic field time variations, but their geographical distribution is uneven. This limits the maximum SH degree expansion of SV models derived from this data set alone to 7 or 8. Satellite measurements provide on the contrary a global and homogeneous data set, but their motion along their orbit introduce an unwanted ambiguity between spatial and temporal variations. The close orbit encounters over a given region and at a similar local time occur every 3 months on average (depending on the satellite mission), which precludes an accurate separation between external ionosphere/magnetosphere and internal core fields at shorter time scales (Finlay et al. 2017). Typically, the SV model resolution will be less than SH degree 10 if the SV model is derived from only one year of data. Accumulating several years of data allows to reach an SV resolution of SH degree 11 to 13 with a temporal resolution probably between 1 or 2 years (recall Fig. 5). Above these SH degrees, a significant amount of temporal averaging (> 2 years) is required to derive stable estimates of the SV.
This delicate balance between temporal resolution and variance of the SV estimates is highlighted by the modelling technique used to derive SV models. The most common way for deriving such models is to impose a temporal smoothing constraint on the core field model in order to separate core signals from contributions of unknown origin (e.g. Finlay et al. 2020). The amount of smoothing is adjusted by the modeller depending on how well the other poorly controlled contributions (e.g. induced field, ionospheric fields) are handled. With such a modelling scheme, the variances of the SV estimates are generally strongly underestimated (Lowes and Olsen 2004), but seemingly realistic estimates of the average SV can be derived up to SH degree 20. It is worth noting, however, that at SH degree 20 other sources of bias and errors have to be considered. For example, the varying fields induced in the conductive crust Thébault et al. 2009), the fields generated in the ionosphere, or unidentified sources such as-e.g. slowly varying oceanic currents, are expected to become comparatively significant. Alternative approaches for modelling the core field do not use temporal smoothing to the cost of having large SV estimated variances (e.g. Baerenzung et al. 2020;Ropp et al. 2020). In principle, this approach provides valid estimates of these variances, yet they are likely to be also underestimated if all sources contributing to the magnetic signal are not described and co-estimated.
Another source of limitation for the quality of SV models comes from the mantle that is not a strict electrical insulator. The Earth's mantle is a conductive layer that filters, depending on the spatial wavelength, the magnetic field variations it affects. The time delays introduced can be as large as three months (Jault 2015) for the largest wavelengths, precluding the derivation of robust SV estimates at the CMB for shorter temporal periods. Furthermore, there are still some doubts regarding the mantle conductivity values (Kuvshinov et al. 2021) particularly in the lower mantle at depths below 1500 km. A higher mantle conductivity at depth is implying longer time delays, and lower temporal resolution for the SV.
The quality of the SV model is not the only factor that can affect the quality of the prediction obtained by physics-based approaches. In fact, the accuracy and the spatial resolution of the static part of a core field model are also crucial to obtain accurate forecasts, as the static core field model is of paramount importance for the definition of the initial condition used for subsequent integration by the physics-based engine. The accuracy and resolution of the static part of a core field model are both limited because the contribution of the fields generated in the lithosphere and the core cannot be easily separated from magnetic data alone. The maximum resolution of static core field models is around SH degree 13 or 14 ( ∼ 3000km), and it is very unlikely that we can improve this in the near future in a deterministic way. For SH degrees lower than 13-14, these models are also contaminated by the Earth's lithospheric field, and its contribution may be significantly enhanced with downward continuation to the CMB. Hemant and Maus (2005) proposed models to predict the lithospheric field from SH degree 1 to 90 using geological and geophysical information. However, the comparison between the forward prediction and models derived from satellite observations still show important discrepancies in shape and strength in the visible lithospheric field waveband. This in turn casts doubts on the robustness of the large-scale lithospheric field predicted from these geological/geophysical information that contaminate the static core field. It is possible to use statistical prior information based-e.g. on theoretical spatial power spectrum of the lithospheric field (Jackson 1990(Jackson , 1994Voorhies et al. 2002;Thébault and Vervelidou 2015), to try to separate the core field from the lithospheric field at the largest spatial scales. However experiments in this direction showed that this is still challenging.
It remains that the nonlinear interactions of the unknown static core field of SH degrees larger than 13 with the large-scale core flow has an imprint on the large-scale secular variation (see e.g. Eq. (9) above). We note that in a dynamo-based approach, this effect can be incorporated in Step 2 (initialization) of the flowchart of Fig. 4, through the prior information provided by dynamo simulations, in the form of covariance matrices, using a truncation at SH degree 30 that captures the relevant nonlinearities (see Aubert 2015, for details).
In summary, and regardless of the modelling technique used, it appears that the understanding and modelling of the other sources contributing to recorded magnetic signals is a key issue for progressing in the description of the core field. In currently available models, it is the fields induced in the conductive crust and mantle, together with the fields generated in the high-latitude ionosphere, that impede the temporal resolution of the core secular variation, whose bounds remain to be assessed. It will eventually be the limited spatial resolution of the core static field, whose length scales beyond degree 13-14 are masked by the lithospheric field, that will become the main limiting factor for main field prediction with physics-based approaches. In any event, further effort is needed to produce input error covariance matrices that reflect as faithfully as possible the subtle mix of temporal and spatial uncertainty of the input models of B and Ḃ these approaches rely on.

Conclusion
The last generation of the IGRF witnesssed a shift of the centre of mass of SV candidate models from mathematical extrapolation towards physics-based forecasts. Physics-driven approaches hold promises, and can be improved along several directions, notably in terms of the crucial description of rapid, inter-annual processes. It remains, however, that the success of any SV candidate will be contingent upon the acquisition of high-quality geomagnetic data by dedicated satellites and ground-based observatories. Obtaining continuous support for the collection, curation and distribution of these data is of paramount importance. When this is secured, and under the assumption that we as a community move towards finely orchestrated sequential workflows, we may even consider the possibility and feasibility of releasing IGRF models more frequently than every 5 years, with the immediate benefit of SV models becoming automatically more immune to rapid events such as secular acceleration pulses.

Appendix A: Direction error
In this appendix we provide the reader with information concerning the directional performances of past IGRF-SV models and candidates. We define the directional pointwise error between the true magnetic field vector and the forecast by the angle between their horizontal components, H t and H f , respectively, according to We evaluate α at Earth's surface 5 years after the release if each IGRF SV model, from IGRF-4 to IGRF-12. The rms error at Earth's surface for those past IGRF SV models, in addition to the one obtained for SV candidates to IGRF-11 and IGRF-12, is shown in Fig. 6a. The maximum error value at geographical latitudes less than 60 • is shown in Fig. 6b. We notice in particular that the error was quite large during the 2015-2020 time frame, in the wake of the 2014 jerk reported by Torta et al. (2015); see also Kotzé (2017); Soloviev et al. (2017).

Fig. 6
Direction error at Earth's surface for past IGRF SV models (blue) and IGRF-11 and IGRF-12 candidate models (in grey). a Root-mean-squared error. b Maximum error at mid-latitude ML (absolute value geographic latitude smaller than 60 • ). See text for the definition of the pointwise error. Magnetic fields are expanded up to spherical harmonic degree and order 8