Geomagnetic secular variation forecast using the NASA GEMS ensemble Kalman ﬁlter: A candidate model for IGRF 2020

We have produced a 5 year mean secular variation (SV) of the geomagnetic ﬁeld for the period 2020-2025. We use the NASA Geomagnetic Ensemble Modeling System (GEMS), which consists of the NASA Goddard geodynamo model and ensemble Kalman ﬁlter (EnKF) with 512 ensemble members. Geomagnetic ﬁeld models are used as observations for the assimilation, including gufm1 (1590-1960), CM4 (1961-2000) and CM6 (2001-2019). The forecast involves a bias correction scheme that assumes that the model bias changes on timescales much longer than the forecast period, so that they can be removed by successive forecasts. The algorithm was validated on the time period 2010-2015 by comparing with the 2015 IGRF before being applied to the 2020-2025 time period. This forecast has been submitted as a candidate model for IGRF 2025.

One such application is to use a numerical model to predict geomagnetic SV on time scales of several years and longer, if proper initialization is made using data assimilation techniques, as is done in numerical weather prediction (NWP). Forecast results from this geomagnetic data assimilation approach have already been used as predictive field candidates for previous IGRF models (Kuang, et al., 2010;Fournier, et al., 2015).
Obviously, this approach is different from other SV forecasts made without utilization of dynamo models.
For example, Olsen, et al. (2010) modeled the time dependence of the Gauss coefficients with order 6 B-splines so that the predictive SV for 2010-2015 can be extrapolated from the coefficients derived from observations prior to 2010. While utilization of geodynamo models can take advantage of adding dynamically consistent motional induction in the outer core to make SV forecasts, it also carries additional complications, such as model bias that results from the mismatch between the parameter values used in the models and appropriate for the Earth's core, and propagation of observational errors in both time and space. These difficulties have been analyzed through assimilation experiments with simplified systems (Sun et al 2007;Fournier et al 2007), observation system simulation experiments (OSSEs) (Liu et al 2007; Aubert and Fournier, 2011;Fournier, 2013). It is expected that improvements to both modeling and assimilation techniques will make geomagnetic forecast a growing contributor of the IGRF candidate models.
In this paper, we report our mean SV forecast for the period from 2020 to 2025. This effort differs from our earlier work for IGRF candidate model (Kuang et al 2010) in two major areas: we assimilate the observed gauss coefficients from a single geomagnetic field model (Sabaka et al in this special issue) for the period from 1960 to 2019; and the error covariance in our assimilation is determined through a large ensemble of assimilation solutions, and is updated in each analysis cycle. The two improvements can help produce more accurate SV forecasts and uncertainty estimates, since, for example, the utilization of a single field model can avoid any potential disagreement in different field models that may lead to inconsistent observation error estimates, and because the time varying error covariance can provide more consistent analysis for each cycle. This paper is organized as follows: the algorithm of our forecasts is given in Section 2; testing and validation are presented in Section 3, followed by the forecasts and discussion.

Geodynamo model and data assimilation algorith
The geomagnetic data assimilation system used for our SV forecasts is the NASA Geomagnetic Ensemble Modeling System (GEMS). It consists of a numerical geodynamo model (Kuang et al 1999(Kuang et al , 2003, an ensemble Kalman filter (Evensen, 2009;Sun and Kuang 2015). The model solves a set of nonlinear magnetohydrodynamic equations in a spherical shell domain. Details of the model are given in Kuang and Bloxham (1999), Kuang and Chao (2003) and Jiang and Kuang (2008). The core state in this model is defined with the velocity v, the magnetic field B and the relative density anomaly ∆ρ. These state variables are described by by spherical harmonic expansions in the co-latitude θ and the longitude φ directions, with the spherical harmonic coefficients given on discrete radial grid points. If we denote by x the vector of these coefficients, then the dynamo system can be symbolically represented as where M includes all linear and nonlinear model operators. The GEMS system employs a sequential ensemble Kalman filter (EnKF), described below, which is the result of many years of development that started with a simple one dimensional system (Sun et al., 2007) and an optimal interpolation (OI) system (Kuang et al., 2008(Kuang et al., , 2009. These systems have shown how geomagnetic data assimilation can be used to estimate geodynamo model parameters and geomagnetic field model uncertainty Kuang, 2015, 2018).
The observations used for GEMS are the gauss coefficients {g m l , h m l } (where l and m are spherical harmonic degrees and orders) from geomagnetic field models. They represent only the internally generated magnetic field. In the outer core and electrically conducting lower mantle (called the D -layer), the magnetic field can be decomposed into the toroidal (B T ) and the poloidal (B P components: wherer is the radial unit vector, and T B , P B are the toroidal and poloidal scalars and are described in where {Y m l } are the orthonormal spherical harmonic functions, and C.C. denotes the complex conjugate part of the expansion (similar expansions are also made for v and ∆ρ). At the top of the D -layer, i.e.
at r = r d , the toroidal field vanishes, i.e. j m l = 0; but the poloidal field coefficients b m l can be matched where r s is the mean Earth surface radius, and L obs is the highest degree of the observed geomagnetic field. In the rest of this paper, we denote the poloidal spectral coefficients in (4)  into the forecast (x) f at analysis time t a to produce the analysis x a : where H is the observation operator matrix and K is the gain matrix, defined by: where P f is the forecast error covariance and R is the observation error covariance. The analysis x a is then used as the initial state of the dynamo system 1 for the next forecast. The forecast error covariance is computed from an ensemble of forecasts with N ens members (Sun and Kuang 2015) where µ f x is the ensemble mean forecast state. In our forecasts here, we take only the diagonal part of the observation error covariance R and is defined simply as the estimated variance of the field models, (σ m l ) o . The time dependent model for the observation error standard deviation is where α = 0.1 before 1900, 0.001 ≤ α ≤ 0.1 with an exponential decrease from 1900 to 2000, and α = 0.001 afterwards. This observation error model accounts for the increased accuracy of geomagnetic models at later times, as well as the higher relative errors for the higher degrees. It has been tested through a number of numerical experiments with the goal of minimizing forecast errors, and is therefore regarded as an appropriate estimate relative to our geodynamo forecasts.
The geomagnetic field models that are used to generate the gauss coefficients for our assimilation, and (9) Details of the first two models can be found in Jackson et. al. (2000) and Sabaka et al. (2004). Details of the last field model can be found in Sabaka et al (2020). We only summarize here that this model uses a "comprehensive inversion" technique which co-estimates parameters for the Earth's internal field, using estimates of systematic and random errors, so as to produce an optimal separation of the internal and external fields.
The ensemble is initialized in 1590 CE using a long free model run, and subsequent analyses are computed every 20 years until 2000 CE, when the analysis cycle is reduced to one year (the start of the forecast period using the CM6 model). P f is updated with the forecasts at each analysis time t a , but H is only updated if different L obs is chosen for making the analysis. We use an ensemble of 512 for these experiments, and we have found this size to be essentially converged relative to smaller ensembles of 256 or 400. This means that an ensemble larger than 512 is not expected to produce appreciably more accurate forecasts.
Geomagnetic forecasts using a geodynamo model will naturally contain significant forecast errors due to mismatches in the dynamo parameter values and numerical approximations, and these can grow significantly during a multi-year forecast. But because these error sources vary on relatively long timescales, we can assume that the model error itself varies on time scales much longer than, e.g. the 5-year forecast period. This assumption allows us to implement a bias correction scheme for the SV prediction by producing a set of staggered forecasts, and then take differences between them to remove the bias (Kuang et al 2010). The forecast of the observed poloidal field is defined as the projection of the state vector, x f onto the observation space: We define the forecast error, (t), as the difference between the forecast state, y f (t), and the true state, The forecast error is a combination of model error and the growth of initial state errors, but we assume that model errors remain the largest component. Consider a forecast that is initialized from an analysis at time t a and produces a forecast at time t a + δt. We estimate the SV from the change in the geomagnetic field after time δt: whereẏ is the SV and˙ is the SV error growth rate. We can reduce this error term by making use of the assumption of slowly varying model error and considering staggered forecasts that begin at analysis times t a andt a , where |t a −t a | << δt. This gives us forecasts at times t a + δt andt a + δt in terms of the true state: The difference between these two sequences was shown to approximately remove the forecast error (Kuang et al. 2010), because the change in model error between t a + δt andt a + δt is small compared to the SV.
So we can write an expression for the SV aṡ where τ a = |t a −t a |.
The details of the bias correction scheme can be found Kuang et al., 2010.

Validation Experiments
We where τ a = 1 year. The SV for the Gauss coefficients (g m n , h m n ) are then computed as follows. We show the details for g m n , while h m n follow the same procedure. Note that this notation, traditionally used it geomagnetic field modeling, is equivalent to (g m l , h m l ) which is generally used in geodynamo modeling.
Because we are crossing between the two fields, we use these sets of notation interchangeably. In our assimilation, we utilize the scaled coefficients We then approximate the axial dipole coefficient g 0 1 in (17) with the values from the field models, e.g.
CM6 (Sabaka, et al., 2020) as follows: The formulations (14) We compute the differences between this forecast and the CM6 coefficients in 2015 and plot them in terms of mean square fields (dashed line)(MSF), along side the CM6 MSF (solid line) in Figure 3. We also include the ensemble uncertainty from the 2015 forecast (dotted line). The relative differences are seen to vary from 10 −7 for n = 1 to 10 −2 for n = 8. The ensemble uncertainty, which does not contain information on forecast bias, slightly underestimates the errors for n = 1 − 3, and is very close at higher degrees. This is an indication that, after bias correction, most of the errors in the forecast that remain are random rather from model bias.

Mean SV for 2020-2025
We repeat the same method to make the SV forecast for the period from 2020 to 2025, except that we The experiments, labeled A, B and C; are summarized as: .
With these three sets of forecasts, we can use up to the following three different combinations of the differences between two staggered forecasts: where again τ a = 1 year and the AC experiments are 2 years apart. The SV for the forecast Gauss coefficients computed as before, along with the SV for CM6.
The bias corrected SV forecast for 2025 will be a combination of the different forecasts, BA, CB or CA from (21). There are two main reasons for our last analysis ending at 2019: one is that the predictive field from CM6 for 2020 will be used as the initial condition for our forecast calibration; the other is the SV from CM6 for the period 2019-2020 (ġ m n ) CM 6 = (g m n ) CM 6 (2020) − (g m n ) CM 6 (2019) /(1year) can be used to estimate our model bias. Some representative forecasts that demonstrate the bias Finally, the mean annual SV gauss coefficients are determined viȧ The uncertainties are determined by the standard deviation of the forecast ensemble. The final results are given in Table 1 along with the associated uncertainties, which are the ensemble standard deviations from the ensemble forecast.

Conclusions
We have used an EnKF to forecast the geomagnetic secular variation (Table 1) for the time period 2020-2025, using 400 ensemble members. The development of ensemble forecasting techniques has contributed to the SV forecasts, particularly due to the ability to estimate uncertainty, which is newly available with this geomagnetic forecast. The validation forecast for 2015 shows that the uncertainty estimate is in line with the differences with CM6 for that year, which also demonstrates that the bias correction scheme is removing a significant part of the systematic errors. We also use the assimilation of field models as a means to understand how estimates of field model errors impact the accuracy of geomagnetic forecasts from a geodynamo model. The 2025 SV forecast includes uncertainty estimates derived from the forecast ensemble spread. The large ensemble has placed a computational constraint on this forecast, because at present it is not computational feasible to run the assimilation system for 7000 years, as was done in [Kuang et al., 2010]. Future work will investigate the use of localization schemes that would enable the use of a much smaller ensemble that could be run for longer time periods.

Data availability
The data (IGRF candidate Gauss coefficients) is included as a table in this paper.