Evaluation of candidate models for the 13th generation International Geomagnetic Reference Field

In December 2019, the 13th revision of the International Geomagnetic Reference Field (IGRF) was released by the International Association of Geomagnetism and Aeronomy (IAGA) Division V Working Group V-MOD. This revision comprises two new spherical harmonic main field models for epochs 2015.0 (DGRF-2015) and 2020.0 (IGRF-2020) and a model of the predicted secular variation for the interval 2020.0 to 2025.0 (SV-2020-2025). The models were produced from candidates submitted by fifteen international teams. These teams were led by the British Geological Survey (UK), China Earthquake Administration (China), Universidad Complutense de Madrid (Spain), University of Colorado Boulder (USA), Technical University of Denmark (Denmark), GFZ German Research Centre for Geosciences (Germany), Institut de physique du globe de Paris (France), Institut des Sciences de la Terre (France), Pushkov Institute of Terrestrial Magnetism, Ionosphere and Radio Wave Propagation (Russia), Kyoto University (Japan), University of Leeds (UK), Max Planck Institute for Solar System Research (Germany), NASA Goddard Space Flight Center (USA), University of Potsdam (Germany), and Université de Strasbourg (France). The candidate models were evaluated individually and compared to all other candidates as well to the mean, median and a robust Huber-weighted model of all candidates. These analyses were used to identify, for example, the variation between the Gauss coefficients or the geographical regions where the candidate models strongly differed. The majority of candidates were sufficiently close that the differences can be explained primarily by individual modeling methodologies and data selection strategies. None of the candidates were so different as to warrant their exclusion from the final IGRF-13. The IAGA V-MOD task force thus voted for two approaches: the median of the Gauss coefficients of the candidates for the DGRF-2015 and IGRF-2020 models and the robust Huber-weighted model for the predictive SV-2020-2025. In this paper, we document the evaluation of the candidate models and provide details of the approach used to derive the final IGRF-13 products. We also perform a retrospective analysis of the IGRF-12 SV candidates over their performance period (2015–2020). Our findings suggest that forecasting secular variation can benefit from combining physics-based core modeling with satellite observations.


Introduction
The International Geomagnetic Reference Field (IGRF) is a series of models describing the large-scale internal part of Earth's magnetic field. The spherical harmonic coefficients comprising the IGRF are agreed upon by Open Access *Correspondence: erwan.thebault@univ-nantes.fr 3 Laboratoire de Planétologie et Géodynamique, UMR 6112, Université de Nantes, Université d' Angers, CNRS, Nantes, France Full list of author information is available at the end of the article an international task force of geomagnetic field modeling experts and are typically updated every five years to account for temporal field variations originating in Earth's core. The task force overseeing IGRF operates under the auspices of the International Association of Geomagnetism and Aeronomy (IAGA) Working Group V-MOD. The IGRF model is used by academia, government, and industry in a wide variety of applications including magnetic reference systems, long-term dynamics of the Earth's core field, ionospheric electrodynamics, space weather phenomena, electromagnetic induction, local magnetic anomalies in the Earth's crust, surveying, and orientation in three dimensions. Readers interested in the history of IGRF are referred to Barton (1997) and Macmillan and Finlay (2011). The purpose of this paper is to summarize all of the candidate models which were submitted for consideration for the thirteenth generation of IGRF (hereafter IGRF-13) and to report the methods used by the task force to evaluate the candidates and construct the final IGRF-13 models.
The IGRF-13 task force was formally elected at an IAGA V-MOD Working Group business meeting in Cape Town on 28 August 2017, however several additional members joined afterward. The full IGRF-13 task force consists of the authors of this paper. On 26 March 2019, the task force issued an international call for modeling teams to contribute candidates for (1) a new Definitive Geomagnetic Reference Field (DGRF) for epoch 2015.0 to spherical harmonic (SH) degree and order 13, (2) a new provisional IGRF for epoch 2020.0 to SH degree and order 13, and (3) a predictive constant secular variation (SV) forecast for the interval 2020.0 to 2025.0 to SH degree and order 8. The term 'definitive' is used because the best available datasets before and after the epoch were used by the modeling teams, and so any further substantial improvement of these retrospectively determined models is unlikely. In contrast, the provisional IGRF model will eventually be replaced by a definitive model in a future revision of the IGRF when the community has a more complete knowledge of the Earth's magnetic field for epoch 2020.0.
A record eleven candidate models were received for DGRF-2015, twelve for IGRF-2020, and fourteen for the 2020-2025 SV forecast. In total, fifteen international teams participated in the IGRF-13 call. During the fall of 2019, the task force evaluated each candidate model using well-established methodologies. The task force voted on the procedure to determine the final IGRF-13 models based on the recommendations of each evaluation report. Each institution participating in the task force received one vote for each of the three models under consideration. The task force released IGRF-13 on 19 December 2019. The official set of spherical harmonic coefficients as well as a discussion of the general features of the IGRF-13 model can be found in Alken et al. (2020b).
The 15 teams who participated in the IGRF-13 call were led by the British Geological Survey (UK), China Earthquake Administration (China), Universidad Complutense de Madrid (Spain), University of Colorado Boulder (USA), Technical University of Denmark (Denmark), GFZ German Research Centre for Geosciences (Germany), Institut de physique du globe de Paris (France), Institut des Sciences de la Terre (France), Pushkov Institute of Terrestrial Magnetism, Ionosphere and Radio Wave Propagation (Russia), Kyoto University (Japan), University of Leeds (UK), Max Planck Institute for Solar System Research (Germany), NASA Goddard Space Flight Center (USA), University of Potsdam (Germany), and Université de Strasbourg (France). See Table 1 for a list of models submitted by each team as well as references to papers describing the preparation of each candidate model in detail. The table also lists the letter codes used throughout this paper to refer to specific candidate models for the different teams.
The number of institutions who participated in the IGRF-13 is larger than for any previous generation, which highlights the advance of global capability in this area of research. The composition of the teams shows strong cooperation between scientists both within their own countries and internationally. Geomagnetic field modeling is reliant on the high-quality data collected by the various space agencies and institutes which operate satellites and ground-based observatories. In addition to satellite data, modelers made extensive use of data from the international network of ground geomagnetic observatories either directly, or indirectly in the form of magnetic indices monitoring the level of magnetic activities.
The IGRF-13 candidate models were built primarily using data recorded by the European Space Agency (ESA) Swarm satellite mission (2013-present) and ground magnetic observatories. One team used the China Earthquake Administration's CSES mission (Shen et al. 2018) exclusively for its candidate model (Yang et al. 2020). Some teams built parent models spanning longer time intervals, which additionally made use of data from the CHAMP (2000( ), Ørsted (1999( -2013( ), and SAC-C (2000( -2004 missions. Additionally, one team used data recorded by the ESA Cryosat-2 mission (2010-present) to supplement the ground observatory network during the gap period between CHAMP and Swarm .
The definition of the internal field as requested by IGRF-13 has some ambiguity. In previous generations, it was considered to include the core field, long wavelength lithospheric field, steady oceanic and tidal magnetic fields and induced fields due to time-varying external sources.
During past IGRF generations, some teams attempted to separate the effects of the induced field, arguing that it was not truly an internal field. However, a counter argument made at the IAGA 2017 DIV-V business meeting suggested it remains extremely difficult, at present, to effectively remove the internally induced ionospheric field in a consistent manner, as there is insufficient resolution of Earth's global conductivity. In addition, many IGRF users require knowledge of the combined field due to all internal sources on or above Earth's surface. This led to the general agreement at the meeting that the induced field and tidal fields should remain within the IGRF main field model to avoid introducing a step into the Gauss coefficients between generations.
As in previous generations, a variety of data selection, processing, and modeling procedures have been used to build the IGRF-13 candidates. We briefly summarize the approaches of the various groups below; the detailed descriptions of the techniques used to derive the individual candidate models can be found in the papers appearing in this special issue (see Table 1).
For the main field at epochs 2015.0 and 2020.0, some teams derived their candidate models from parent models describing the magnetic field over multi-decadal periods by combining datasets from several satellite missions and ground observatories Huder et al. 2020;Ropp et al. 2020;Sabaka et al. 2020;Wardinski et al. 2020). Several teams built parent models covering the full Swarm satellite mission era (November 2013 to present) (Brown et al. 2020;Rother et al. 2020;Vigneron et al. 2020). Other teams built dedicated main field candidate models for each of the epochs requested by the call, thus using data within smaller time windows centered on 2015.0, or immediately preceding 2020.0 (Pavón-Carrasco et al. 2020;Yang et al. 2020;Alken et al. 2020a;Petrov and Bondar 2020), which required less complex parameterization in time. Some teams co-estimated a low-degree external field model representing magnetospheric sources (Brown et al. 2020;Yang et al. 2020;Pavón-Carrasco et al. 2020;Rother et al. 2020;Ropp et al. 2020;Huder et al. 2020;Sabaka et al. 2020;Vigneron et al. 2020). Two teams co-estimated additional geomagnetic source fields in their parent models Sabaka et al. 2020). For DGRF-2015, the teams simply output snapshot Gauss coefficients from their parent models at 2015.0 truncated to degree 13. For IGRF-2020, extrapolation of the Gauss coefficients to 2020.0 was required, as the candidate models were delivered to the task force in October 2019. Many teams used linear or spline-based extrapolation of their parent model Gauss coefficients to accomplish this.
For the predictive secular variation models covering 2020.0 to 2025.0, the candidates broadly fall into two main categories: (1) computing SV solely from the latest available satellite and ground data (Alken et al. 2020a;Pavón-Carrasco et al. 2020;Huder et al. 2020;Petrov and Bondar 2020;Rother et al. 2020), or (2) applying physics-based modeling, combined with recent satellite and ground data to forecast future field changes based on underlying core dynamics (Brown et al. 2020;Fournier et al. 2020;Minami et al. 2020;Metman et al. 2020;Sanchez et al. 2020;Tangborn et al. 2020; Wardinski et al. 2020). Accurately forecasting the temporal evolution of the main geomagnetic field is nontrivial, due to challenges such as the low resolution of the recoverable magnetic field at the core-mantle boundary, the occurrence of unpredictable geomagnetic jerks or the uncertainty associated with diffusion of the field over short timescales (e.g. Maus et al. 2008a;Bärenzung et al. 2018;Whaler and Beggan 2015). As an example, in the past two decades, the northern polar region has experienced large variations with the location of the magnetic dip pole moving in an irregular manner (Chulliat et al. 2010;Thébault et al.. 2015;Alken et al. 2020b;Livermore et al. 2020). In the section Retrospective analysis of IGRF-12 secular variation models, we examine the performance of the IGRF-12 SV forecasts over the 2015-2020 time interval to investigate how physics-based approaches compare to the empirical SV derived from recent satellite data.
Members of the volunteer task force carried out evaluations of the candidate models submitted by the different teams. Assessment of the candidate models was primarily based on statistical criteria. Some main field (MF) and SV models showed greater consistency than others. However, close statistical agreement between models does not necessarily mean that these models are 'correct' . It can also be a consequence of using similar data selection or modeling techniques. For this reason, the evaluation of the task force members also relied on companion descriptions of the candidate models as well as on their expert opinion, and in certain cases, comparisons with independent data sets.
To decide the mechanism for deriving the final models, the task force chair (P. Alken) prepared a ballot with all suggestions put forth by the individual evaluation teams. The vote was held in December 2019, and the task force chose to select the median of the Gauss coefficients of all candidates for the DGRF-2015 and IGRF-2020 main field models, and an iterative robust weighting in space for the SV-2020-2025 model (see Robust Huber model). The resulting IGRF-13 coefficients were prepared and checked before being made available to the public through the IAGA Division V, V-MOD working group web page (http://www.ngdc.noaa.gov/IAGA/vmod/igrf. html). Updated software, web services and online calculators are also available for public use (see Alken et al. 2020b, Sect. 5).
The next section of this paper summarizes the statistical criteria used by the task force members for the testing and the inter-comparison of the candidate models. We then discuss the procedure used to build a robust Huberweighted model, followed by analyses of all candidates for the DGRF-2015, IGRF-2020 and SV-2020-2025. Next we provide a retrospective analysis of the IGRF-12 secular variation forecasts over 2015.0-2020.0. Finally, we provide details of the final adoption of the IGRF-13 models.

Evaluation methodology
The IGRF is a series of mathematical models describing the internal geomagnetic field on and above Earth's surface, and a prediction of its annual rate of change (known as secular variation) for five years beyond the date of issue. We assume there are no local magnetic field sources in the IGRF region of validity, so that the global magnetic field can be expressed as the gradient of a scalar potential, B = −∇V . The scalar potential V is approximated as a finite series: Here, g m n (t) and h m n (t) are the Gauss coefficients which depend on time t, conventionally given in units of nanoTesla (nT). The coordinates (r, θ , φ) are geocentric radius, co-latitude, and longitude, and a = 6371.2 km is a reference value approximating the mean Earth radius. The functions P m n are the Schmidt semi-normalized associated Legendre functions of degree n and order m (Winch et al. 2005). N specifies the spherical harmonic degree truncation value, which was chosen to be 10 up to and including epoch 1995, after which it was increased to 13 to model smaller scale internal signals which can be captured by high-resolution satellite missions such as Ørsted, CHAMP and Swarm.
The IGRF-13 teams submitted sets of Gauss coefficients g m n , h m n to SH degree and order 13 for DGRF-2015 and IGRF-2020 in units of nT, and first time derivatives ġ m n ,ḣ m n to SH degree and order 8 for SV-2020-2025 in units of nT/year. In the evaluation of the candidates, we compared two models by computing differences of the Gauss coefficients and plotting spatial difference maps of the field components. Past IGRF evaluations found it prudent to also compare individual candidates with the mean and median models of all candidates, and the IGRF-13 task force continued this practice. We also calculated a robust Huber iteratively reweighted model in space from all candidate models following the procedure used for IGRF-12 (see Robust Huber model).
In the sections below, we briefly define the statistics which were used to evaluate the IGRF-13 candidate models. These same quantities were used during the IGRF-11 (Finlay et al. 2010) and IGRF-12  ( evaluations. In the equations which follow, we use the main field Gauss coefficients g m n , h m n for convenience; however, the secular variation coefficients ġ m n ,ḣ m n may be substituted in their place to analyze the SV candidate models.

Model differences
We examine differences between two models in order to identify regions of strong discrepancies, both in the spatial and spectral domains. Because the Gauss coefficients in the scalar potential (Eq. (1)) appear as linear terms, we can compute direct differences of these parameters from two different models, and substitute the differences into the relevant equations, both for the scalar potential V and vector geomagnetic field B . If we define the Gauss coefficients of a particular candidate model i by i g m n and i h m n , then the coefficient difference between two models i and j is defined as Since coefficients at higher degrees are far smaller in magnitude than the lower degrees, we will consider weighted differences √ n + 1 i,j g m n and √ n + 1 i,j h m n when plotting these quantities in this paper.

Spherical harmonic power spectral differences
The mean square value of the vector geomagnetic field over a sphere of radius r due to all harmonics of SH degree n for a given candidate model i is (Lowes 1966(Lowes , 1974 This is known as the Lowes-Mauersberger power spectrum, and may also be applied to the differences between two models: Summing the power i,j R n over all SH degrees from 1 to the truncation level N and then taking the square root provides the root mean square (rms) vector field difference between models i and j: In this paper, we will use the notation i R n , i,j R n , and i,j T to denote these quantities computed at the Earth mean radius r = a.

Azimuthal power spectral differences
The Lowes-Mauersberger expression in Eq.
(3) organizes spectral power in terms of spherical harmonic degree n. It is also instructive to analyze power as a function of the azimuthal ratio az = m/n , which varies from 0 for zonal harmonics to 1 for sectoral harmonics. We define the ratio as positive for the g m n coefficients and negative for the h m n . The azimuthal power is then defined as the mean square value of the field over a sphere of radius r produced by all harmonics with the same azimuthal ratio. For model i, the azimuthal power spectrum is defined as where A az = (m, n) ∈ Z 2 : m/n = az . In this paper we will primarily compute spectral differences between two models i and j: In this paper, we calculate the azimuthal power for each ratio m/n, and bin the results in 29 bins of width 1/(N + 1) . The values in each bin are then summed together to obtain the power in that bin. Note, we will use the notation i,j R az to denote azimuthal power at the Earth's mean radius r = a.

Degree correlation
At r = a , the correlation per degree between two models i and j is defined as (Langel and Hinze 1998, p. 81) In this paper, we will primarily compute the degree correlation between a candidate model i and a reference model, such as the mean or median of all candidates, in which case the j g m n , j h m n coefficients would be replaced with the Gauss coefficients of the reference model in Eq. (8). This can provide guidance on whether the harmonics of a given candidate correlate poorly with the reference model at particular degrees.

Robust Huber model
To build the robust Huber model, the Gauss coefficients of each candidate model were used to compute the vector geomagnetic field components at the Earth's mean radius on an equal area grid with 10,000 nodes (Leopardi 2006). On each grid node, there are as many vector field values as candidate models. The variations among candidates at each node produce an ensemble of magnetic field values. The Huber algorithm in space iteratively reweights the geomagnetic vector components of each candidate on the grid according to its deviation from the ensemble of available candidate model predictions. This produces Huber weights for the candidate model at each node in the three vector field components. The weights range from 0 to 1, illustrating regions where the candidate models agree (or disagree) with the ensemble of magnetic field predictions. This numerical computation is based on the Huber error distribution (Huber 1981): where N c is the number of available candidate models. The parameter ǫ is the residual between the N c candidate models vector values and the maximum likelihood model estimated in the least-squares sense, normalized by a robust estimate of the residual standard deviation. The constant c is chosen as a compromise between a Laplace distribution (obtained when c = 0 ) and a Gaussian distribution (obtained when c → ∞ ). Our Huber weighting scheme uses c = 1.5.
All N c vector field values on the nodes of the grid are thus associated with their Huber weights. The final model is the inversion of the Huber-weighted gridded magnetic vector field values from each candidate in spherical harmonics (see Thébault et al. (2015) for further details). The major advantage of this method is that all candidates are considered: even if some disagree in certain regions, they provide an input to the final model. The Gauss coefficients of the median, mean and Huber-weighted models are provided on the IAGA website (https ://www.ngdc. noaa.gov/IAGA/vmod/IGRF1 3/Mean_Media n_Model s/).

Analysis of DGRF-2015 candidate models
The call for the DGRF-2015 requested models describing the large-scale internal field up to SH degree 13 with Gauss coefficients defined to a precision of 0.01 nT. We list the eleven candidate models (and their references) received for epoch 2015.0 in Table 1. All DGRF-2015 candidate models used vector fluxgate magnetometer (VFM) Swarm measurements as their primary data source, with the exception of the IP model, which used measurements from the absolute scalar magnetometer (ASM) instrument on Swarm . The ASM instrument is capable of providing both scalar and vector measurements (Fratter et al. 2016;Léger et al. 2015). Table 2 Root-mean-square vector field differences i,j T in units of nT between DGRF-2015 candidate models and also between candidates and the arithmetic mean reference models M, median reference model M med and robust

Huber-weighted in the rightmost columns
The bottom row is the simple arithmetic mean of the columns In Table 2, we present the rms vector field differences ( i,j T in nT) between the individual candidate models i and j at the Earth's reference radius r = a . The last three columns show the rms difference between each candidate model i and (a) the simple arithmetic mean model M, (b) the median model M med and (c) the robust Huberweighted model. The required precision of 0.01 nT corresponds to a rounding error of just 0.13 nT. All models show rms differences well above this value illustrating the nuances from the various data selection and field modeling approaches adopted by each team.
From Table 2, we observe that candidates B, CM, CU, D, G, IP, N, and P are in closer agreement with the arithmetic mean and median models than models IS, IZ, and S. The rms difference between each model of this first group and the mean/median models is less than 5 nT. The bottom row of Table 2 gives the arithmetic means of the rms vector field differences of i,j T of model i from the other models j. Again, models B, CM, CU, D, G, IP, N, and P have the smaller mean differences. Model IS is the most distinct with a mean rms difference to all models of 8.89 nT.
The median and robust models in Table 2 have the smallest average difference from the candidates (3.37 and 3.33 nT). The difference between these two values is small, and since, as will be discussed later, we have selected the median model as the final DGRF-2015, we use the median model as the reference in the following  Figure 1 (left) shows the spectral difference per degree ( i,j R n ) between the candidate models and the median model. We find that the degree 1 and 2 coefficients of the IS model deviate most from the median, while degree 5 of the IZ model exhibits the largest difference. The right panel illustrates the degree correlation between the candidates and the median model, which is over 0.999 for all candidates up to SH degree 10. The absolute differences of Gauss coefficients (weighted by a factor √ n + 1 ) between the candidates and the median model coefficients are small, with the majority below 0.5 nT. Figure 2 (left) illustrates the values for each coefficient. The azimuthal power spectrum is shown in the right panel.
We present in Fig. 3 the differences between the candidate models and the median model for the vertical ( B z ) component at r = a . Some candidates show zonal structure (CM, IZ) or hemispherical differences (N) while the IS model suggests some secular variation is not accounted for due to the pattern of strong small-scale structures in the Indian Ocean regions and large-scale structure over North and South America, reminiscent of SV maps. The maps show there is little difference between the robust Huber model and the median model at r = a.
For the DGRF model candidates, it is possible to compare their magnetic field predictions directly to Swarm satellite data around 1 January 2015. We compared each candidate to one month of vector and scalar data from Swarm A and Swarm B (15 December 2014 to 15 January 2015). The Swarm data were sub-sampled to a rate of 1 sample every 15 seconds with local times between 21:00 and 05:00. We additionally selected geomagnetically quiet data with Kp < 2 and |dRC/dt| ≤ 4 nT/h. Finally, we removed the MF7 lithospheric field model (Maus et al. 2008b) and the CHAOS-6 external field model (Olsen et al. 2014;Finlay et al. 2016) from the Swarm data. A total of 21113 measurements were available for the comparison. The vector and scalar residuals between the Swarm measurements and each DGRF-2015 candidate model were computed. The statistics of the residuals equatorward of ±55 • quasi-dipole latitude are presented in Table 3. The mean, median and robust models explain the Swarm measurements as well or better than many of the individual candidates. Figure 4 shows the residuals of the vertical component ( B z , North-East-Center (NEC) frame) between the Swarm data and DGRF-2015 candidate models. We see several models have larger residuals over the South Atlantic Anomaly region, which we attribute to the larger secular variation of B z seen in this region, even over the one month period under consideration. The IS model has relatively large B z residuals in this region, which is likely related to the treatment of their SV as seen in Fig. 3. Many of the maps display north-south along-track oscillations. These are primarily due to the SH degrees 14 and 15 which were not removed from the Swarm data, as the DGRF models are truncated at degree 13, while the MF7 crustal field starts at degree 16.

Analysis of IGRF-2020 candidate models
We performed a similar analysis for the twelve candidate models submitted to IGRF for epoch 2020.0. Because magnetic field measurements were not available close to the desired epoch of 1 January 2020, the derivation of the main field candidates for the IGRF-2020 model was more challenging than for DGRF-2015. Many teams utilized satellite and ground data up to mid-2019 for their model calculations and then extrapolated the main field to 2020.0. Eleven of the candidates are based on Swarm Level 1b data and many include ground-based observatory data. The CE model is based solely on data from the Chinese Seismology and Electromagnetism Satellite (Shen et al. 2018), which collected high latitude data specifically for IGRF-13. Most candidates use relatively simple forms of extrapolation (e.g. linear or spline) from the end-point of their model in 2019 to extend their main field to 1 January 2020. As shown in Table 4, the various extrapolation schemes resulted in larger rms vector field differences between the IGRF-2020 candidates compared to the DGRF-2015 candidates.
The rms differences vary from around 5 to 15 nT with no obvious clustering of models. There is a reasonably uniform distribution of the differences, which suggests there is no 'best' population of models. However, the CE, G, and S candidates differ the most from the rest of the population with consistently higher differences compared to other candidates. The bottom row of the table is the arithmetic mean of the columns, and can be considered as an average difference of a given candidate model with respect to all other candidates.
The mean, median and robust models have the lowest average differences from the candidate models, with the median and robust models showing similar values (7.81 vs 7.78 nT). We again choose the median model as the reference for comparison, since this is the model which was chosen for the final IGRF-2020 as will be discussed later. Figure 5 (left) shows the power spectral difference per degree between the candidates and the median model (note the logarithmic scale of the y-axis). There is a wider variation in the mean square differences per degree for the IGRF-2020 candidates (up to 70 nT 2 compared to 15 nT 2 for the DGRF-2015 candidates). The CE model differs in degree 2, G in the degrees 3, 4 and 5, IZ mostly in degree 2, and S in degrees 1 and 3. As shown in the right panel of Fig. 5, the degree correlation remains above 0.99 until degree 11 for all candidates. Figure 6 (left) plots the absolute differences between the candidate Gauss coefficients and their median, weighted by a factor of √ n + 1 . The individual coefficients have larger differences in general compared to the DGRF-2015. In the right panel, the azimuthal power spectrum indicates stronger differences for m = n for the G model. Most models have an increase in the spectral difference for the zonal terms. This is most likely due to contributions from the magnetospheric ring current and its Earth-induced part, despite efforts to minimize this effect. Figure 7 shows the differences in the vertical component between the IGRF-2020 candidate models and the median at 2020.0. Many of the maps exhibit north-south hemispherical differences with respect to the median model, which arise from the variations in the n = 1 dipole terms among the candidate models. The maps also show the effects of the extrapolation to 2020.0 as the scale bar is now ± 50 nT (c.f. ± 20 nT for Fig. 3). Similar patterns are visible such as the zonal and auroral differences in the CM candidate and the SV-related differences in the IS candidate. Some models have an obvious sectoral or hemispherical difference, such as B, CU, N and P. The CE candidate shows generally positive differences in the mid-latitudes and negative variation at the auroral zones. The G candidate has large differences in the western hemisphere and central Pacific Ocean.

Analysis of IGRF-13 SV-2020-2025 candidate models
Fourteen teams submitted candidate models for the predicted secular variation in the 2020.0-2025.0 time period. For these models, teams submitted model coefficients ġ m n ,ḣ m n to spherical harmonic degree and order 8, which represent the average annual change in the Gauss coefficients between 2020.0 and 2025.0 in units of nT/ year. The institutes which led the different SV candidate models are presented in Table 1.
There are a variety of approaches for forecasting secular variation. The B candidate uses core flow modeling with steady flow and acceleration, while CU, D and G use extrapolation of measured secular variation from the previous year or longer. The CM candidate made predictions in a similar manner, but produced it from an ensemble of field models built with a bootstrap approach from subsets of observations. The IP, K, P, M and N candidates use assimilation of ground and/or satellite data with geodynamo model outputs to form an estimate of

Table 4 Root-mean-square vector field differences i,j T in units nT between IGRF-2020 candidate models and also between candidates and the arithmetic mean reference models M, median reference model M med and robust Huber-weighted model in the rightmost columns
The bottom row is the simple arithmetic mean of the columns

Fig. 5
Power spectral differences (left) and degree correlation (right) between the IGRF-2020 candidates and the median model secular variation. IS uses an ensemble of models from a stochastic prior and predicts with a Best Linear Unbiased Estimator. The L candidate used core flow combined with magnetic diffusion for its forecast. The S model has employed a singular spectral analysis of sixty years of SV to provide candidate values while IZ produced a spectral fit to each individual Gauss coefficient. The candidates offer a diverse set of approaches to compute SV over the period 2020.0 to 2025.0 which produces a wide variation between the candidates, as seen in Table 5. The candidate differences fall into two fairly distinct categories-those that match each other to within 15 nT/year and those whose differences are greater than 20 nT/year. The candidates most different from the rest of the population are N, S, and K.
From visual inspection of the Gauss coefficients, it is noted that the K candidate has a small dipole ( ġ 0 1 ) SV coefficient of 1.9 nT/year compared to 6-7 nT/year for most of the other models. The S model has a negative ġ 0 1 coefficient of − 6.4 nT/year as well as a high ġ 1 1 coefficient of 11.5 nT/year compared to 7-8 nT/year in others. These coefficients contribute predominantly to the differences between these two and the other candidates. Finally, the N model has significantly different degree 2 zonal coefficients compared to the other candidates and thus shows strong variation.
The mean, median and robust models computed from the candidates have an average difference of around 11 nT/year (last three columns of Table 5). When computing the Huber-weighted model, candidates CM, K, and S showed large-scale differences with respect to the ensemble of candidates that are down-weighted by the Huber weighting scheme. Model IZ is moderately downweighted. The spatial difference between the robust model and the median model remains below 5 nT/year everywhere on the r = a reference surface. The robust model has a slightly smaller overall difference to the candidates (10.84 vs 10.92 nT/year). Figure 8 (left) shows the wide variation of the candidates in terms of their power spectral differences with the robust model. The S model has the strongest degree 1 difference, while the K and N models have large degree 2 variations. The degree correlation of the models is much lower, dropping to 0.7 for the L model at degree 7, for example, which may relate to the influence of magnetic diffusion in their forecasting approach. The percoefficient differences in Fig. 9 (left) reach up to 15 nT/ year. The azimuthal power spectrum is shown in the right panel.
The maps of the differences in the vertical component of the candidates to the robust Huber model are shown in Fig. 10. For the B, CU, G, IZ, L and M candidates, most of the variation is around the western and central Pacific regions, which have been experiencing a strong geomagnetic jerk over the past five years. Models K, N, and S show hemispherical or degree 2 variations and deviate significantly from the robust Huber model.

Comparison of candidate models with ground observatory data
Another manner in which to judge the validity and accuracy of the IGRF candidates is to compare them with (semi-)independent data from the ground observatory network. Many candidates do include observatory Absolute Gauss coefficient differences weighted by a factor of √ n + 1 (left) and azimuthal power spectral differences (right) between the IGRF-2020 candidates and the median model information in some form though it is usually only a small fraction of the data used.
In Fig. 11, we examine the SV measured at twelve observatories around the world. The SV data are monthly mean values derived from selected quiet-time hourly means, which were computed from the raw observatory data using the procedure of Macmillan and Olsen (2013). We plot the monthly mean data from 2015.0 to 2019.5. The observed values of the vertical component are shown as grey circles. The IGRF-12 predicted SV between 2015 and 2020 is plotted in dotted blue. The forecasted SV from the candidates, as well as the mean, median and robust models are shown as colored lines extending from 2020 to 2025. For most of the observatories, the predicted SV from IGRF-12 is in reasonably good agreement at 2015.0 but becomes worse for later years due to unmodeled secular acceleration. The 2014 geomagnetic jerk (Torta et al. 2015) caused strong acceleration for several years in certain locations, which is particularly evident in the IRT, HON, KAK, and GUA datasets. The SV at HER appears to have been poorly predicted at 2015.0 (at 40 nT/year) compared to its true value (around 60 nT/ year). The candidate forecasts have a wide scatter; for example at HON, the G candidate predicts SV of − 12 nT/year while the S candidate suggests almost + 30 nT/year. The median, mean and Huber are close, with a central forecast around −5 nT/year. The forecasts are closer at CKI or BOU, though often there are outlier candidates e.g. the N model at SFS or KAK. In general, Fig. 11 illustrates the difficulty associated with making a simple linear forecast of a complex and dynamic system.

Retrospective analysis of IGRF-12 secular variation models
Some insight may be gained into secular variation forecasting by investigating the performance of the IGRF-12 candidate models over their forecast period of 2015.0 to 2020.0. We compared the secular variation predictions of the IGRF-12 candidate models with both SV derived from ground observatory measurements and SV provided by the final IGRF-13 model over the 2015.0 to 2020.0 time period. For the observatory analysis, we started from the hourly mean database compiled by the  British Geological Survey (Macmillan and Olsen 2013) with data until August 2019. Data were selected for geomagnetically quiet conditions (ap index ≤ 10 ), and dayside ionospheric contributions were minimized by selecting data between midnight and 05:00 local time.
Only observatories equatorward of ±55 • QD latitude were used to minimize contamination from polar ionospheric current systems, resulting in a total of M = 42 stations. We fit cubic splines with knot separations of 1 year to the individual B x , B y , and B z components of each station for the time interval January 2006 through August 2019. These splines were then linearly extrapolated to 2020.0 by using the spline's slope at the last knot. Figure 12 presents the analysis for the Honolulu (HON) observatory. The right panels show the original B x (top), B y (middle), B z (bottom) measurements in blue, the remaining data after quiet-time selection in green, and the spline fit in black. The left panels show, for the same B x , B y , B z components, annual differences of the selected observatory data in black, annual differences of the fitted spline in red, and the predictions of each IGRF-12 candidate model, including the mean, median, and final IGRF-12 SV model over the 2015.0-2020.0 time period.
Note that the LN model refers to LPG Nantes, which was not a lead institute for IGRF-13. We can see that HON experienced a significant geomagnetic jerk around 2014 with a strong acceleration after the 2015.0 epoch in the B z component, which all forecast models failed to capture. This jerk has been analyzed by Torta et al. (2015) and Kotzé (2017). We define S c,j (t) as the c-th component ( c = x, y, z ) spline fitted to the magnetic field data at the observatory j, with 1 ≤ j ≤ M , in units of nT, with the annual difference spline: in units of nT/year. Then the rms difference between IGRF-12 candidate model i and the observatory-derived splines, integrated over the 5-year time interval is given by where t = 5 years and Ḃ c,i (r j ) is the secular variation of the field component c due to IGRF-12 candidate i at the observatory site r j . The rms values are presented in Table 6. With one exception, all lead institutes which provided a secular variation model for IGRF-12 also submitted a model for IGRF-13, and so we use the same institution letter codes as given in Table 1. The B-12 model exhibits the closest agreement to the observatory splines in the B y component, while the IP-12 model shows the lowest rms difference in the B x and B z components. The B-12 secular variation estimate was built from modeling core flow velocity and acceleration initialized using Swarm data prior to the model epoch of 2015.0 and then stepped forward in time to 2020.0 using physically realistic constraints (Hamilton et al. 2015). The (10) S c,j (t) = S c,j (t + 6 months) − S c,j (t − 6 months) Absolute Gauss coefficient differences weighted by a factor of √ n + 1 (left) and azimuthal power spectrum difference (right) between the SV-2020-2025 candidates and the robust model IP-12 secular variation model was built using an empirical Swarm-based model at epoch 2014.3, which served as the initial condition for a 3D geodynamo model which was integrated forward in time to 2020.0 (Fournier et al. 2015). The superior performance of these two models suggests that secular variation forecasting could benefit by incorporating physics-based modeling of core dynamics. The median of all IGRF-12 candidates outperformed the mean model as well as the final robust Huberweighted IGRF-12 model in the B x and B z components with a slightly worse performance in the B y component.
It is also instructive to compare the IGRF-12 candidates with the final IGRF-13 SV model over 2015.0-2020.0. We note that because the IGRF-13 main field model for 2020.0 is provisional, the SV model over 2015.0 to 2020.0 will change in a future IGRF generation. However for the purposes of analyzing the IGRF-12 candidates we will consider this model to represent the truth during this time interval. Figure 13 presents spatial map differences of dB z /dt between the IGRF-12 SV candidates and the IGRF-13 SV model over 2015.0 to 2020.0. While there exist small-scale differences between the different maps, all IGRF-12 candidates exhibit similar large-scale differences against IGRF-13. We attribute these structures to the global secular acceleration patterns associated with the 2014 geomagnetic jerk (see Torta et al. 2015). The global structure of the field accelerations due to this geomagnetic jerk were unavailable in the datasets used to construct the IGRF-12 candidates, and so none  of them were able to accurately account for these signals over their forecast period of 2015.0 to 2020.0.
Final IGRF-13 models Thébault et al. (2015) provided an extensive discussion on the various methodologies available for weighting the candidate models submitted to the IGRF-12 call. For the IGRF-13 call, we have found the candidates are, in general, much closer to each other than in previous generations. The mean, median and robust Huber models derived from the DGRF-2015 and IGRF-2020 candidates exhibit close agreement with each other. For the SV-2020-2025 candidates there was a wider divergence in the candidates with strong differences between the more extreme ranges of the forecasts. The IGRF-13 evaluation panel voted on the proposed methods for combining the candidate models. The choices for the DGRF-2015 and IGRF-2020 models were to use a median of the Gauss coefficients or the spatial Huber-weighting method. For the SV-2020-2025 model, the median of the Gauss coefficients and spatial Huberweighting were again proposed, as well as an additional suggestion of the simple average of models B, IP, IS, L, M, and P only, in order to exclude the most different candidates. The panel voted in favor (though not unanimously) to use the median of the candidates for the DGRF-2015 and IGRF-2020 to compute the final coefficients. For the SV-2020-2025, the panel voted to use the robust Huber-weighted model in space to compute the Gauss coefficients.
Finally, we clarify the process for computing the final coefficient values. For the DGRF-2015 and IGRF-2020 coefficients, all candidate models were rounded to two decimal places, if not already provided in this format. For DGRF-2015, the median of the Gauss coefficients for each SH degree and order was computed to two decimal places. For the IGRF-2020 there were an even number of candidate models, so the Gauss coefficient median is the average of the two central coefficient values when sorted, which can occasionally produce an answer with three  (HON). Left panels show the observatory annual differences of daily mean values (black), annual spline difference curve (red) and IGRF12 candidate predictions for the dB x /dt (top), dB y /dt (middle), and dB z /dt (bottom) components in the NEC frame. Right panels show the original observatory data (blue), data after selecting for geomagnetically quiet periods (green), and fitted spline curve (black) for the B x (top), B y (middle) and B z (bottom) components decimal places. Those coefficients with three decimal places of resolution were subsequently rounded to two decimal places before being output. The SV-2020-2025 model output of the Huber robust weighting computation was rounded to two decimal places before output.

Conclusion
The 37 submitted candidates for the 13th generation of the International Geomagnetic Reference Field (IGRF) were thoroughly analysed in order to deduce a suitable methodology for combining them into a final set of models for release. A total of 15 international teams submitted candidates for consideration. These teams submitted eleven candidate main field models for the definitive (DGRF) epoch 2015.0, twelve candidate main field models for the IGRF epoch 2020.0, and fourteen secular variation models for the forecast period covering 2020.0 to 2025.0.
A volunteer taskforce consisting of the authors of this paper carried out their analyses separately, reporting back to the IAGA DIV V-MOD chair and co-chair. The analyses used both spectral and spatial comparisons, and comparisons to independent data sets. All reports are available at the IAGA website https ://www.ngdc.noaa. gov/IAGA/vmod/IGRF1 3/evalu ation s. Their conclusions were used to guide the final vote on the methodology for computing the final Gauss coefficients for each product.
The IAGA IGRF taskforce voted to use the median of the DGRF-2015 candidate Gauss coefficients as the final model, the median of the IGRF-2020 candidate Gauss coefficients as the final model and a robust Huberweighting scheme in space for the SV-2020-2025 model. The final model coefficients were released for 1st January 2020 and are freely and publicly available (Alken et al. 2020b).