The BGS candidate models for IGRF-13 with a retrospective analysis of IGRF-12 secular variation forecasts

The three candidate models submitted by the British Geological Survey for the 13th generation International Geomagnetic Reference Field are described. These DGRF and IGRF models are derived from vector and scalar magnetic field data from the European Space Agency Swarm satellites and ground observatories, covering the period 2013.9 to 2019.7. The internal field model has time dependence for degrees 1 to 15, represented by order 6 B-splines with knots at six monthly intervals. We also solve for a degree 1 external field time dependence describing annual and semi-annual signals with additional dependence on a bespoke Vector Magnetic Disturbance index. Satellite data are weighted by spatial density, along-track standard deviations, and a larger-scale noise estimator defined in terms of a measure of Local Area Vector Activity at the geographically closest magnetic observatories to the sampled datum. Forecasting of the magnetic field secular variation for 2020–2025 is by advection of the main field using steady core surface flows with steady acceleration applied. We also investigate the performance of the previous generation of candidate secular variation models, for IGRF-12, analysing the agreement of the candidates between 2015 and 2020 with the retrospective IGRF-13. We find that there is no clear distinction between the performance of mathematically and physically extrapolated forecasts in the period 2015–2020. We confirm that the methodology for the BGS IGRF-12 predictions performed well, despite observed secular accelerations that are highlighted by our analysis, and thus justify the methodology used for our IGRF-13 SV candidate.


Background
The International Geomagnetic Reference Field (IGRF) is a set of spherical harmonic models of the Earth's main magnetic field updated every 5 years under the auspices of the International Association of Geomagnetism and Aeronomy (IAGA). Every year prior to the release of the next generation, an open invitation is made for the submission of candidate models from the international community. These models are evaluated by a task force and the next set of IGRF models are constructed from the candidates. The 13th generation of the series required three new sets of coefficients: two main field (MF) models to degree and order 13 at epochs 2015.0 and 2020.0, and an average secular variation (SV) model valid from 2020.0 to 2025.0 to degree and order 8.
The British Geological Survey (BGS) has a long association with the IAGA and the IGRF, and has submitted candidates for many generations (e.g. Barraclough and IAGA Division I Working Group 1987). For the 13th generation, the production of our candidate models was carried out in three steps: (a) selection of data from the high-quality Swarm mission and ground magnetic observatories; (b) fitting and evaluation of a parent model; and (c) derivation of the candidate MF and SV coefficients. Field modelling within BGS has matured since the release of the IGRF-12 candidate models of Hamilton et al. (2015), with higher temporal and spatial resolution of the secular variation using order 6 B-splines, and the improvement of core flow advection using steady flow and steady flow acceleration to predict the secular variation. In addition, the underlying code base has been modernised and modularised, allowing massively parallel inverse solvers such as PETSc (Balay et al. 1997) and SLEPc (Hernandez et al. 2005) to be used (see Brown et al. 2020).
The IGRF call requested candidate models of the main field and the large-scale lithospheric contribution, which satellite and observatory data allow us to produce. However, there are inevitably magnetic fields arising from other sources; in particular, the ionospheric current system, magnetospheric ring current and partial ring currents, tidal, and induced currents in the Earth. These are difficult to co-estimate and separate from the desired core field signal (e.g. as in the Comprehensive Model series, see Sabaka et al. 2018). To avoid contamination, measurements significantly affected by these sources are removed through careful data selection prior to modelling, whilst the remainder are averaged out, smoothed or ignored.
Even with strong selection criteria, unwanted fields remain a problem, particularly external fields in the auroral zones at high latitudes. One approach to reduce this is to only employ scalar data at high latitudes rather than vector measurements, the horizontal components of which are much more sensitive to perturbation due to field-aligned current systems (e.g. Smith et al. 2017). We use only observatory pseudo-scalar data (the scalar data projected onto the field from an a priori main field model) at higher latitudes. This affects the co-estimation of the observatory biases making it easier to maintain the linear relationship between data and model coefficients. However, for the satellite data, we use vector components at all latitudes where available. We quantify the increase in noise in the data when we fit our parent model using a combination of two noise estimators: an along-track standard deviation calculated for each component from short (20 s, roughly 150 km) segments of the satellite data and a larger-scale estimator of the vector disturbance using observatories closest to the data sample (Local Area Vector Activity (LAVA), Thomson et al. 2010).
This study explains how we treat the data, compute the data covariance matrix and perform the inversion for the various sources of the magnetic field. In the 'Methods' section, we describe the data used and their selection criteria and briefly outline the data weighting scheme. We describe the parent model and the process used to fit the model parameters. We outline the core-flow method of modelling and predicting SV. In the 'Results' section, we evaluate the model coefficients and derive and evaluate the Definitive Geomagnetic Reference Field (DGRF) for 2015.0, IGRF for 2020.0, and SV for 2020-2025 candidate models. In the final section, we investigate the behaviour and performance of the previous generation of IGRF-12 SV candidates to test which produced the best fit over the 2015-2020 epoch.

Methods
The production of high-quality field models is greatly eased by access to globally distributed measurements at all latitudes and at high cadence. The abundant availability of satellite and ground observatory data makes collation and processing of magnetic data relatively straightforward and provides a plethora of information regarding the large-scale field of the Earth.

Data
Our data selection criteria largely follow those developed through our IGRF-11 and IGRF-12 candidates (Hamilton et al. , 2015. The most notable difference being that we use only Swarm satellite data for the IGRF-13 period, having passed the eras of Ørsted and CHAMP missions. Two data sources were used to construct our candidate models: (1) the European Space Agency (ESA) Swarm mission and (2) the ground observatory network.
The latest available Swarm data files, up to version 0507, were collected for Alpha, Bravo and Charlie, from the mission start on 2013/11/25 to 2019/08/14. To reduce the quantity of data to manageable levels, the satellite data were sub-sampled at every 20th datum, though this sampling frequency later becomes irregular when data selection criteria are applied. In order to reduce contamination from magnetospheric and ionospheric sources, data were selected according to the criteria outlined in the following section.
For ground observatory data, hourly mean vector values were collected from the World Data Centre for Geomagnetism, Edinburgh for the period 2013/01/01 to 2019/07/31. These were supplemented with other more recent data reported to INTERMAGNET, or internally from the nine BGS run observatories. Our period of study included both definitive and quasi-definitive data (from those observatories which provide this). As a rough guide to the proportion of quasi-definitive to definitive data used in our study, daily quasi-definitive data files accounted for the following percentages of files collected prior to beginning our data processing, by year: < 1 % in 2013; < 1 % in 2014; < 1 % in 2015; 2% in 2016; 5% in 2017; 20% in 2018; 100% in 2019. These percentages do not take into account which data remained after data selection to be used in the model inversion. All data were subjected to a rigorous quality control procedure in the manner of Macmillan and Olsen (2013), accounting for all known baseline steps, and discarding any obviously anomalous values. Where baseline steps were identified but had not been measured and reported by the observer, records were divided into separate series at the steps. This allows us to account for differing baseline offsets through time by parameterising separate crustal bias terms in our model. Table 1 summarises the selection criteria that were applied to Swarm and observatory data. We select both ground and satellite data primarily to minimise the impact of magnetopsheric and ionospheric activity, as we wish to describe the background state of the internal field. We also must reduce the large volume of satellite data to practical levels given the availability of 1 Hz measurements from three Swarm satellites, but choose to be slightly more lenient with our criteria for observatory data given their relative dearth and a desire for even data coverage. At low geomagnetic dipole (GMD) latitudes, the sub-sampled Swarm data (every 20th datum) were retained only for magnetically quiet, local night times. The local time selection aims to minimise the presence of enhanced ionospheric Sq currents during the day. Magnetically quiet times are identified primarily based on the Dst (Nose et al. 2015) and Kp indices (see Acknowledgements) but also by solar wind parameters. Solar wind parameters were provided by the OMNI2 data set (see Acknowledgements), which provides a forward projection of measurements made at the L1 Lagrange point to Earth's magnetopause. We use these values computed at the magnetopause of the interplanetary magnetic field (IMF) properties in our data selection.

Data selection criteria
At high GMD latitudes, no local time selection was used to avoid seasonal gaps, but an additional filter for the merging electric field (Em ) was applied. Here Em was calculated from the IMF as recommended by Newell et al. (2007), and averaged hourly to give an indication of sustained enhancement of solar wind coupling to the magnetosphere. We use vector data at all latitudes where possible. Scalar data from the ASM instruments on Swarm were used only when vector data from the VFM instruments were unavailable.
We also use full field observatory data from 161 locations, at hourly mean cadence, and selected for local night times at all latitudes. The observatory distribution is illustrated in the top panel of Fig. 1, with positions and number of data used noted in Table 2. Vector data were used below GMD latitude |55 • |. Above this latitude, vector data were used to calculate a projection of scalar data onto a prior BGS core field model. The projection of scalar data onto a prior model (pseudo-scalar data) provides a linear relation between the modelled Gauss coefficients and the crustal biases that are required for the solution with full field observatory data.
The resulting distribution of data in time is shown in the middle panel of Fig. 1. Our data selection criteria are quite severe, notably around December of 2014 and 2015 where activity levels (primarily Kp and dDst/dt) were persistently high enough to preclude almost all Swarm data. Given our primary focus on the large-scale internal field, we employed strict selection to produce a quiet, night-time reference model. We do not see a significant change in the resulting model if our criteria are slightly relaxed to allow more data at these times. The drop off of observatory data numbers towards the end of the model span reflects the months to years delay in delivery from the majority of observatories-we commend those observatory operators that provide quasi-definitive data on a faster timetable. The separation of observatories between those giving vector data and those converted to projected-scalar data is shown in the top panel of Fig. 1. The distribution of the data in time and latitude is shown in the bottom panel of Fig. 1. The observatory data are at fixed latitudes and so form continuous horizontal lines, whilst the Swarm satellites cover all latitudes between |87 • | but are subject to local data selection. This leaves gaps in the record at certain periods when the satellites are in a dawn-dusk orbit, for example. Over the course of 2014-2019, the orbit of Swarm B drifted relative to Swarm A and C in local time, allowing the complementary data gaps in their respective records to be covered later in the mission. A total of 5,924,486 data were used to construct the parent model.

Swarm data weighting
We assign a purely diagonal prior covariance matrix to our data in our inverse problem, though we also apply iterative reweighting of misfits during the calculations. Satellite data prior variances were calculated from a combination of noise terms based on (a) along-track standard deviation calculated over each 20 s segment of 1 Hz data, in each vector component, (b) external field activity as measured at the geographically nearest magnetic observatories, described by the Local Area Vector Activity (LAVA) index (Thomson and Lesur 2007), (c) the spatially uniform noise (2 nT standard deviation) and (d) for scalar values, a function of solar zenith angle, z, (in nT), (1 + cos 2 (z)) . These variances were then scaled by data density within 1 • equal-area tesserae, such that the close spaced high-latitude data are down-weighted. Figure 2 shows an example of the spatial information contained in the along-track weighting, which clearly shows the rationale behind our separation of high latitude and mid-to low latitude data based on influence of external fields. The influence of the auroral oval in each vector component is notably different, and distinct from the polar cap within the oval. Note that the along-track standard deviations for the X and Y components are significantly larger than those for the Z component in the auroral ovals (typically > 7 nT, as opposed to < 3 nT), reducing the significance we give to data potentially contaminated by sources such as field aligned currents. This exemplifies why we choose to use such dedicated estimates of noise in deriving prior data variances, rather than assigning generic values to all data. For ease of visualisation, Fig. 2 shows the median values during 2017 in approximately 0.5 • equal-area tessera, showing that whilst the along-track weightings are specific to the time of observation, there are persistent spatial patterns. Once our data selection criteria have been applied (as given in Table 1), remaining mid-to low-latitude data are relatively less dense than at high latitudes as shown in Fig. 2. Despite the lower data density at mid-to low-latitudes (and as such not easily identifiable in Fig. 2), we also see the along-track standard deviations highlight increased variability at the equatorial electrojets, anomalous satellite tracks, and erroneous data points.

Ground observatories data weighting
Ground observatory data were given a simple prior variance weighting: (a) spatially uniform noise for vector data below |55 • | GMD latitude (2 nT standard deviation) or spatially uniform noise for projected scalar data above |55 • | GMD latitude (6 nT standard deviation) and (b) for scalar data, a weight as a function of solar zenith angle, z, (in nT), ( 1 + cos 2 (z)).
These variances were then scaled by data density within 5 • equal-area tesserae, to account for regions such as Europe which have relatively dense coverage. A final scaling factor was applied to the variances of all observatory data such that the total weight is roughly 10% that of the total weight of all satellite data.

Parent model parameterisation
The BGS IGRF-13 candidate models are derived from the latest generation of our "parent model" BGS Model of Earth's Magnetic Environment (MEME) . The methodology follows from that of our IGRF-11 and IGRF-12 candidates described by Hamilton et al. (2010) and Hamilton et al. (2015), respectively. We diverge from the equations given in Hamilton et al. (2015) only in the values used for particular parameters, as described below. All data and model parameterisations are in the NEC coordinate frame, unless otherwise specified.
The core field is solved for a spherical harmonic model to degree and order 15, with an order 6 B-spline time dependence. The spline has 6-month spaced knots running from 2012.9 to 2019.9, which are regularised by the time integral of the 3rd time derivative of the radial magnetic field over the core-mantle boundary. The model is also regularised by the 2nd time derivative of the radial magnetic field over the core-mantle boundary at the end knots. It has 4,845 parameters. The crustal field part of the model is considered static in time, and is described from degrees 16 to 55, giving 2,880 parameters.
The large-scale slowly varying external fields are modelled to spherical harmonic degree and order 1 using an order 2 time dependent B-spline with 3-month spaced knots from 2012.9 to 2019.9. This gives 87 parameters. The large-scale rapidly varying external and induced field are similarly captured by a spherical harmonic degree and order 1 with order 2 B-spline time dependence governed by the VMD index (Thomson and Lesur 2007), with 3-month knots. This has 174 parameters. The 3-month knot spacing is consistent with the spans of Local time at GMD latitudes below |55 • | 23:00 ≤ x ≤ 05:00 01:00 ≤ x ≤ 02:00 Absolute difference between datum and a preliminary version of our parent model ≤100 -

|F-|B|| [nT]
Absolute difference between ASM and magnitude of VFM data ≤2 - Hourly mean of 1 minute merging electric field, calculated as in Newell et al. (2007) ≤0.8 -   observatory data from which the VMD index is derived, and allows more rapid variation than do the core field splines. Other external parameters with periodic variations are accounted for by sine and cosine terms accounting for external and induced, annual and semi-annual, variations (14 parameters) and external sine and cosine terms accounting for diurnal variations, parameterised by sun-synchronous longitude (42 parameters). Finally, as we use full field observatory data, the unknown crustal biases at the observatories are modelled as static offsets for each vector component or the scalar field at each observatory location. This accounts for unresolved local crustal fields captured in the full field data, giving 410 parameters. Where an observatory record was identified to contain undocumented baseline steps that could not be rectified, the record was divided into separate series. Each divided series was assigned separate bias terms in the model to account for the baseline steps. The observatory records requiring multiple bias terms are highlighted in Table 2 with a .

Parent model estimation
The model inversion solves for 8,452 parameters from 5,924,486 data by least-squares regularised minimum norm, using a null starting model and an L2 norm. This is followed by a single iteration of regularised iteratively reweighted least squares, using Huber reweighting and an L2 norm. The initial least squares solution was not significantly altered by the iterative-reweighting step and further iterations were oscillations about the solution minimum without improvement in the fit. We take this fast convergence as support for our choices in providing detailed prior data covariances. Regularisation is applied only to the time-varying core field, and is governed by three damping parameters that control the damping at each model end knot, and the time integral, respectively. As our scalar data are projected onto a prior BGS core field model, derived in a similar manner, they are linearly related to our model coefficients. The final model fit to all data has a weighted (by prior data covariance and final iteration of Huber weights of data to model misfit) standard deviation of 1.8 nT; a summary of mean and rootmean-square (RMS) residual misfit is given in Table 3. The weighted RMS values we report here are equivalent to the standard deviation about the mean in each case.
The misfits are comparable with those seen for previous generations of MEME (see e.g. Hamilton et al. 2015, though their values quoted are unweighted and so larger). For the vector components we see lower rootmean-square values at mid-to low-latitudes (smallest RMS of 0.58 nT for Swarm A, B C ) than at high latitudes (largest RMS of 4.81 nT for Swarm B, B N ) due to the increased noise in the vector components at high latitudes. The Centre component has lowest RMS, followed by Eastward and then Northward components for satellite data. For observatory vector data, the Eastward component has the lowest misfit, likely related to the lower contamination in this direction by external field signals at ground level at mid-to low latitudes (note that the observatory vector and scalar components are explicitly split between low-to mid-, and high latitudes, respectively). The approximately zero mean for all residuals shows our model fits the data well without remaining bias. We do see larger mean values for the satellite scalar component (F) due to the small number of data considered, this is not seen for observatory data where a larger, more representative, number of scalar data are used. We see good agreement between the three Swarm satellites in the vector components, and a similar trend in the components of observatory vector data.
The larger misfit at high latitudes is expected, and was anticipated by the lesser prior weightings we assign to the data (Fig. 2), particularly in the horizontal components. The distributions of unweighted posterior residuals by latitude demonstrate this further (Fig. 3). Figure 3 shows the unweighted posterior residuals for Swarm, plotted against both GMD co-latitude and time, for each field component. For ease of visualisation we downsample the residuals, keeping all 2 σ outliers in all 1 • latitude or 90-day time bins to retain the same appearance. The distribution in time shows consistent, approximately zero mean, behaviour, though a slight divergence is evident at the very beginning of the model. We attribute this to our reliance primarily on ground observatory data until the beginning of Swarm data in November 2013, and this does not affect our use of the model from 2015 onward. We do not observe a similar decline in the distribution of residuals towards the end of our model, suggesting our damping choices have controlled the spline ends relatively well.
A thorough validation procedure was conducted, first on the MEME parent model, and then on the subsequent candidate models derived from it. This validation could not be truly independent given the limited sources of geomagnetic observations, but we endeavoured to include comparisons to a wide range of existing field models and to all available ground observatory data.

Derivation of IGRF-13 candidate models
Though requested in the IGRF-13 call, no uncertainties on the Gauss coefficients are provided with our candidate models. We do evaluate the formal standard deviation of each model parameter from the eigenvalues and eigenvectors of our inverse problem, however, these errors are well known to be underestimates of the true uncertainty (see e.g. Lesur et al. 2010). Indeed, even if we were to take these errors as representative, it is not clear how these formal errors would in practice be propagated through from our parent model solution to our derived candidate models. The three BGS candidate models were derived from MEME in the following manner.

Main field at 2015.0
The main field for the DGRF at 2015.0 was taken directly from the core component of MEME, truncating to degree 13. The coefficients were rounded to two decimal places, as requested. These values are given in Table 4. Figure 4 shows comparisons between the power spectra of each of our candidate models and the final IGRF-13 models (Alken et al. 2020a). For the DGRF at 2015.0 we see excellent agreement in the spectra, with our candidate very close to the final chosen model of the median of all candidates (see Alken et al. 2020b). The overall spectral RMS difference between the Gauss coefficients ( g m n , h m n ) of two models (denoted by i, j) can be given by, where n is spherical harmonic degree, m is spherical harmonic order, a is the Earth's reference spherical radius (6371.2 km), and r is the radial position. There is an RMS difference of 2.02 nT between our candidate and the final DGRF at 2015.0, with the largest single coefficient difference being 0.42 nT in the g 0 3 term. The coefficients most different are order 0 and order 1 terms at odd degrees above 3, suggesting an overall approximately zonal patterned discrepancy between the models.
We see a clearer view of the differences between our candidate and IGRF-13 in spatial maps, shown in Fig. 5. Generally, the differences are small in magnitude and spatial scale, and randomly located. The sole discernible geophysical signal is under the auroral zone, in the F, H, Z and X maps. This agrees with the zonal differences noted between our coefficients and the DGRF. This indicates a leakage of auroral field into our core field, not fully eliminated by our data selection, and is a notoriously difficult feature to remove. This signal is, however, small, peaking at < |10| nT in the scalar field, F.

Main field at 2020.0
The main field at 2020.0 was extrapolated from the core component of MEME, then truncated to degree 13. The Gauss coefficients for the main field of the MEME core component at 2019.0 were used as the starting field, to which we added the mean annual core secular variation for the period 2018.25 to 2019.25, sampled at eleven equally spaced time increments, inclusive of the end times. The coefficient values are given in Table 4.
The spectra in Fig. 4 again show very little deviation from the final IGRF-13 median of all candidates

Table 3 Number of data (vector triples counted as one) by type, and the mean and root-mean-square (RMS) misfits between model and data
Misfit values are weighted by prior data covariance and the final Huber weighting of residuals. Satellite data with subscript 'L' indicate data at GMD latitudes below |55 • |, values with subscript 'H' indicate data at GMD latitudes above |55 • |. Observatory vector and projected scalar data are only used in the regions below and above |55 • | GMD latitude, respectively. Residuals are in NEC coordinates (see Alken et al. 2020b). The RMS difference by Eq. 1 is 5.09 nT, generally with coefficient differences largest below degree 4, with the largest differences seen between the h 1 1 (− 1.52 nT) and h 2 2 (1.65 nT) terms. Mapped spatially in Fig. 6, we see small magnitude differences, at larger scales than for the DGRF candidate. Again, differences peak at <10 nT and we attribute the spatial structure not to an effect of our data selection, but to our extrapolation of the field from 2019.0 to 2020.0. The spatial pattern of differences, particularly in Z, alludes to the structure of the SV and highlights the over-and underestimation of peaks in regions of greatest SV (i.e. at 0 • longitude, and in the Pacific). This effect is seen more clearly in the SV candidate comparison of Fig. 7, in the next section.

Secular variation for 2020.0-2025.0
The use of core flow and/or persistence SV is supported by the performance of such IGRF-12 candidates, which produced the best forecast, on average, for 2015 to 2020, despite the SA associated with a jerk in 2014/15 (see Torta et al. 2015;Brown et al. 2016;Kotzé 2017) quickly causing linear SV predictions to diverge (see next Section). We use the steady core flow and acceleration formalism as given in Whaler and Beggan (2015) and Beggan and Whaler (2018) to estimate the 'average' flow and acceleration over 2017.5 to 2019.0. We then forward propagated the estimated main field coefficients of the BGS IGRF2020 candidate from 2020.0 to 2025.0. We computed the difference between 2025.0 and 2020.0 and divided by five to produce the annual SV coefficients for our candidate model (up to degree and order 13), which was then truncated at degree 8 as required.
In detail, we use the monthly SV and secular acceleration (SA) magnetic field values from the BGS parent model output from 2017.5 to 2019.0 to compute the steady core flow and steady core flow acceleration over this 1.5 year period. The core flows were computed using an L1-norm iterative-reweighting scheme to fit the flow and acceleration to the SV and SA coefficients (up to degree and order 14 and 8, respectively). We apply also a tangentially geostrophic constraint to the L1-norm solution (see Beggan and Whaler 2008) before solving for the steady flow and acceleration. Both toroidal and poloidal flow and acceleration are solved for simultaneously. The outputs consist of a steady flow model and steady acceleration model for the period covering 2017.5 to 2019.0.
The forecast from 2020.0 to 2025.0 was made using monthly time-steps (one-twelfth of a year). At each timestep, the Gaunt and Elsasser matrices are computed using the main field model and secular variation from the preceding month. The SV and SA are computed from the Gaunt-Elsasser matrix multiplied by the steady flow and acceleration model and added to the main field to step forward by a month. The process is repeated from 2020.0 to 2025.0. The main field coefficients for 2020.0 are subtracted from main field coefficients of 2025.0 to compute the field change over five years. The annual SV is computed by dividing the difference by 5. Detailed description of the formulae used can be found in Hamilton et al. (2015), and the coefficient values are given in Table 4.
The spectral comparison in Fig. 4 shows good agreement between our candidate and the final chosen spatial   Huber-weighted mean of all candidates (see Alken et al. 2020b). We see a similarity in pattern between the spectra and the spectra difference, suggesting a consistent discrepancy in the magnitude, but not spatial structure, of our SV candidate and the final IGRF-13 SV. This is borne out by the spatial maps of field difference in Fig. 7 where the pattern of our extrapolated SV observed in Fig. 6 is now seen more clearly. Where SV is strongest, particularly the Pacific, we see a discrepancy in the magnitude of SV in our candidate estimate. The RMS difference by Equation 1 is 8.74 nT year −1 , with the largest coefficient differences between the ḣ1 1 (− 2.88 nT year −1 ) and ḣ 2 2 (2.72 nT year −1 ) terms, as seen for our MF candidate at 2020. This confirms that it is a difference in our estimate of the magnitude of SV, primarily in these two coefficients, that causes our MF candidate for 2020.0 and SV candidate for 2020-2025 to differ from the final IGRF-13 models. Comparison to the final IGRF-13 SV and other candidate models suggest that we give a conservative estimate of SV magnitude, driven by the use of a mean value of SV taken through time, rather than the peak magnitude of the spatial pattern of SV. This conservative approach to estimating the SV performed well on the 5 year timescale for our IGRF-12 candidate, as discussed in the next section.

Retrospective analysis of IGRF-12 secular variation candidates
In this section, we examine the forecasts from the IGRF-12 SV candidate models (Thébault et al. 2015b) to investigate whether specific strategies provided a better method for determining magnetic field change over five years. Several different approaches were employed by the nine teams who submitted SV candidates (see summary in Thébault et al. 2015a), but these fall into two general themes: mathematical extrapolations and physical forecast models. The former type we define as models which estimate SV at a time prior to the prediction interval from a parent model (with varying techniques used per team), and then use either persistence or linear extrapolation of SV values to represent the following 5 year prediction period. The latter type are models which are built on the principles of physical processes, such as core flow or geodynamo assimilation, whose state can then be propagated through time from the observed period to provide a hindcast or forecast.
The period of this study, 2015 to 2020, is particularly interesting due to the observed SA being incompatible with the linear SV models of the IGRF format. Early in 2015 it became clear that IGRF-12 candidates had been produced with observations made just before a jerk beginning in late-2014 (Torta et al. 2015;Kotzé 2017). Later, it was also noted that the accelerating movement of the North dip pole was not compatible with earlier linear SV predictions; this issue was exacerbated by the field geometry near the poles (Chulliat et  Our interest was also in part to evaluate our own candidate methodology in light of this, relative to other approaches. An initial estimate of each IGRF-12 SV candidate model's performance was calculated at the time of our SV candidate model production for IGRF-13 (mid-2019). Based on this initial analysis, our IGRF-13 SV candidate estimate of average SV using advected core flow was justified, having performed well in the previous generation despite the SA subsequently observed. We repeated the analysis with the added advantage of direct observations covering the full period span to early 2020.
We evaluated the IGRF-12 SV candidates by adding their respective SV coefficients to the MF coefficients of the DGRF at 2015 (as given by IGRF-13) in 0.1 year increments. We then subtracted each projected step of MF from the MF coefficients interpolated linearly between the DGRF at 2015 and the IGRF at 2020 (as given by IGRF-13). We did this for both the given SV candidate limit of SH degree 8, and with zero-padding of the SV candidates to reach the IGRF limit of SH degree 13.
To assess the differences between SV candidate predictions and the ideal case retrospective linear SV model of IGRF-13, we use the RMS coefficient spectral difference, given by Eq. 1. The resulting RMS differences are shown in Fig. 8, with dashed lines for the degree 8 comparison, and dotted lines for degree 13. The candidates closest to IGRF-13 SV for 2015 to 2020 form a group of 6 models, led by DTU (RMS of 109.3 nT at 2020.0), and closely followed by BGS (110.2 nT). The BGS candidate was also seen to perform well when compared against ground observations by Alken et al. (2020b). Three models diverge slightly from the others, those of NASA GSFC (120.6 nT) and LPG Nantes (121.8 nT), and then IZMIRAN (137.0 nT). The spread of candidates is relatively small though, reflecting the initially small differences noted by Thébault et al. (2015a), and the greatest discrepancy from the "true" field of IGRF-13 by 2020 is gained at roughly 27 nT per year. Overall, there is no clear distinction in performance between the groups of mathematically or physically predictive models. This suggests that the simple persistence or extrapolation models of SV are justified on this timescale, despite known drawbacks (such as sensitivity to damping choices near model ends when using temporal B-splines). It also suggests that, again despite known drawbacks (such as discrepancies between shorter practical forecast periods and the relatively longer timescales of SA (Christensen et al. 2012) and SV (Lhuillier et al. 2011)), physical forecasts can be accurate enough to suffice for purposes such as the IGRF. This appears true, even for this period when SA is a key feature of the field changes observed on periods shorter than the forecast interval.
No significantly consistent prediction of a particular set of coefficients or field features is distinct between two sets of model candidates. For example, it does not appear that the dipole terms, or zonal features, are described better by physical forecasts than mathematical ones. A better correlation is seen between relatively worse predictive performance and initial parent models which contain additional artefacts (i.e. non-SV related MF anomalies which diverge from the consensus of the IGRF at 2015.0 (see Thébault et al. 2015a)). These stem from the particulars of the modelling methodologies used for each candidate and may be caused by many factors, for example, external field contamination of the core model through insufficiently strict data selection or underparameterisation of external models. Figure 8 shows the linear increase in the discrepancy between IGRF-13 SV for 2015 to 2020 and the IGRF-12 SV candidates through time, when considered to degree 8. When the additional smaller spatial scales of degrees 9-13 are also considered, we see an additional baseline error of 140 nT (the summed magnitude of these coefficients from the DGRF at 2015), which then increases through time as the smaller scale SV captured by IGRF-13 is added. The ability to accurately predict these smaller scales of SV for the IGRF is discussed in depth by Silva et al. (2010). For consideration of how much signal cannot be approximated by linear SV, and requires SA, we also show in Fig. 8 the difference between IGRF-13 and an updated version of MEME (labelled "Spline"). This version is as our parent model described here, but with extra spline knots accommodating new observations through February 2020, and thus models the 2015 to 2020 period entirely (in retrospect). The spline model diverges from the pinned IGRF-13 MF epochs of 2015 and 2020, reaching a peak of roughly 20 nT in 2017.5. Despite SA captured by MEME, the linear SV of IGRF-13 is never too far from the "true" field, even when MEME is truncated to degree 8, showing the tendency for observed SA to cause divergence and then convergence back to the path of a linear model. MEME does not converge exactly to zero difference at 2020 as the IGRF-13 estimate of 2020 is preliminary, having been extrapolated in various ways from models built with incomplete observations from earlier in 2019. At 2020, MEME is retrospectively fitting observations, but shows that the discrepancy with IGRF-13 is minor.
As well as the small spread of differences to IGRF-13, there is spatial coherency in the differences between IGRF-12 SV candidates and IGRF-13 SV for 2015 to 2020. This is illustrated in Fig. 9 where we plot the spatial median of field values (on a 1 • by 1 • grid) of all IGRF-12 SV candidates relative to IGRF-13 SV for 2015 to 2020. The anomalies seen are generally of large spatial scale and correlate with the observed SA from the period. We see, primarily in Z, evidence of the accelerations of the 2014 jerk (compare to Fig. 3 in Torta et al. (2015)), in D, H and X, the acceleration of the North dip pole (Chulliat et al. 2019), and in F the accelerating weakening of the South Atlantic Anomaly. The largest magnitude anomalies seen in Fig. 9 also, in part, represent an underestimation of the largest magnitude SV peaks, which cannot adequately be represented by the degree 8 IGRF-12 SV candidates.
Regarding accurate estimation of the magnitude of SV, particularly when the field is accelerating as it was in 2014 and 2015, we suggest that timeliness of data constraining the SV prediction is perhaps as significant a factor as the SV prediction methodology used. When comparing versions of our parent model built with additional more recent data, we find that it is primarily the magnitude of the SV predicted that varies, rather than the spatial pattern or features of the MF or SV, when a few more weeks or months of data are added. This reflects the sensitivity to SA in gauging a longer term average SV level. This relationship to timeliness of data seems to hold when comparing the IGRF-13 SV candidates to our updated parent model at 2020.0. This would suggest that the Fig. 8 Spectral RMS misfit between IGRF-13 and the 9 IGRF-12 SV candidate models for 2015.0-2020.0. Dashed lines represent coefficient differences to SH degree 8, dotted lines to SH degree 13 (with SV candidates zero-padded). "Spline" refers to an updated version of the BGS parent model built with data through February 2020. Those models labelled with bold text used physics based predictions, the remaining used mathematical predictions logistics of building a model as late as possible to capture as much data as possible are worth considering, and that SV predictions based on the most recent measurements as possible may provide the most reliable predictions. As such, for ongoing monitoring when new data are readily available, making regular updates to a field model is likely key to making accurate short term projections. We do not see this as a practical application, however, in the current context of the collaboratory approach to creating the IGRF. This certainly highlights the excellent tools we currently have in the Swarm mission and observatory network, with their rapid data delivery, and is something to be considered in the event of a post-Swarm IGRF without similar data provision.

Conclusions
The BGS parent model has successfully used the most upto-date data available from the ESA Swarm mission and ground-based observatories to produce candidates for the 13th generation of the IGRF. Inclusion of these data sources, especially the observatory data, combined with a regularised model of time dependence using B-splines have allowed BGS to produce a robust model of the core field over the past 7 years. These data have also allowed us to estimate the core flow over this period and to provide a physically based prediction of the secular variation over the lifetime of IGRF-13. The submitted candidates were very close to the final consensus models of IGRF-13.
We examined the quality of the forecasts from the IGRF-12 SV candidate models to determine if there are any particular methods that performed better at predicting field change over five years. For the 12th generation of SV candidates, we found that there was no clear distinction between the performance of mathematically and physically based forecasts. We confirmed that our BGS IGRF-12 SV candidate performed well, despite the observations of SA during the IGRF-12 validity period, and thus justify our use of a similar modelling ethos for this generation's candidates.