Skip to main content

An alternate representation of the geomagnetic core field obtained using machine learning

Abstract

Machine learning (ML) as a tool is rapidly emerging in various branches of contemporary geophysical research. To date, however, rarely has it been applied specifically for the study of Earth’s internal magnetic field and the geodynamo. Prevailing methods currently used in inferring the characteristic properties and the probable time evolution of the geodynamo are mostly based on reduced representations of magnetohydrodynamics (MHD). This study introduces a new inference method, referred to as Current Loop-based UNet Model Segmentation Inference (CLUMSI). Its long-term goal focuses on uncovering concentrations of electric current densities inside the core as the direct sources of the magnetic field itself, rather than computing the fluid motion using MHD. CLUMSI relies on simplified models in which equivalent current loops represent electric current systems emerging in turbulent geodynamo simulations. Various configurations of such loop models are utilized to produce synthetic magnetic field and secular variation (SV) maps computed at the core–mantle boundary (CMB). The resulting maps are then presented as training samples to an image-processing neural network designed specifically for solving image segmentation problems. This network essentially learns to infer the parameters and configuration of the loops in each model based on the corresponding CMB maps. In addition, with the help of the Domain Adversarial Training of Neural Networks (DANN) method during training, historical geomagnetic field data could also be considered alongside the synthetic samples. This implementation can increase the likelihood that a network trained primarily on synthetic data will appropriately handle real inputs. Our results focus mainly on the method's feasibility when applied to synthetic data and the quality of these inferences. A single evaluation of the trained network can recover the overall distribution of loop parameters with reasonable accuracy. To better represent conditions in the outer core, the study also proposes a computationally feasible process to account for magnetic diffusion and the corresponding induced currents in the loop models. However, the quality of the reconstruction of magnetic field properties is compromised by occasional poor inferences, and an inability to recover realistic SV.

Graphical Abstract

Introduction

There are currently four main research directions aiming at exploring how the geodynamo operates. Core surface flow inversions, inverse geodynamo modeling, laboratory experiments and direct numerical simulations (Glatzmaier and Olson 2005; Christensen 2011). The first two of these are the most prevalent tools at present used for inferring the actual state of the geodynamo (Huder et al. 2019). To do this, they make use of the dimensionless form of the induction Eq. (1) describing the MHD interaction between the fluid flows and magnetic fields:

$${\partial }_{t}{\varvec{B}}=\nabla \times \left({\varvec{u}}\times {\varvec{B}}\right)+\frac{1}{Rm}{\nabla }^{2}{\varvec{B}}.$$
(1)

In Eq. (1), \({\varvec{u}}\) contains the flow velocities, \({\varvec{B}}\) is the magnetic induction. The dimensionless magnetic Reynolds number \(Rm=UL/\eta\) is used to represent the ratio of forces between the flow-driven advection of the magnetic field and the viscous dissipation of the field in the absence of flow (also known as magnetic diffusion (Holme et al. 2015)). In the magnetic Reynolds number, \(L\) is a characteristic length scale, \(U\) is the typical speed of fluid flows and \(\eta =1/({\mu }_{0}\sigma )\) is the magnetic diffusivity, \({\upmu }_{0}\) and \(\upsigma\) are the magnetic permeability of free space and the electrical conductivity, respectively.

Core surface flow inversions aim to reconstruct horizontal flow patterns and velocities at the core–mantle boundary (CMB) with the observation of the radial component of the magnetic field using Eq. (1). This approach suffers from non-uniqueness issues arising from the ill-posed nature of the solution and its uncertainty along lines of zero radial induction (Whaler 1986). Interestingly, even such uncertain solutions can lead to more accurate forecasts of SV over three to five years than estimates produced by linear extrapolations of the time dependency of the Gauss coefficients (Whaler and Beggan 2015).

Inverse geodynamo modeling makes use of the quasi-geostrophic approximation of the already simplified Boussinesq equations of convection attached to Eq. (1) to simulate a dynamic interaction of induced magnetic fields and fluid flow in a conductive core (Gillet et al. 2011). These types of dynamical models have the beneficial property of a coupling between flow velocities at the boundary layer (CMB) and those in the bulk of the core. This allows for a Bayesian inference of the distribution of the internal physical parameters of the system from surface magnetic field and flow data which can be fed back into the simulations for predictive modeling using a technique called variational data assimilation (Talagrand 1997). The disadvantage of this technique lies in the instabilities of ensemble modeling and the complex statistical properties (covariances) of simulated parameters being relied on during computations (Sanchez et al. 2018).

Laboratory experiments are perhaps the least commonly investigated of the above-mentioned methods of inquiry. Nevertheless they have provided a handful of results relevant with respect to the geodynamo (e.g., Müller et al. 2008; Miralles et al. 2013; Monchaux et al. 2010; Su et al. 2020). These experiments can also provide valuable basis for testing the robustness and quality of computational reconstruction methods, providing ground truth data about what is expected from complex MHD processes such as the geodynamo.

Direct numerical simulations on the other hand aim at developing computational models which can reproduce the turbulent dynamic regime expected for the core dynamo as authentically as possible. Though this goal is severely hampered in particular by the low viscosity values estimated for the outer core, which seem impossible to approach even exploiting state of the art computational power (Sheyko et al. 2016; Dong et al. 2021), significant leaps towards it have been made. Recent examples of this include the models of (Aubert et al. 2017) and (Aubert 2023). Interestingly, in the rare case where in such studies electric current densities were also reported, results display concentrated current systems emerging with increasing vigorousness. This concentration manifests itself in coil-like current systems surrounding sheet-like plumes of axisymmetric flow (Miyagoshi et al. 2008). The general picture one can obtain about the distribution of electric currents in such simulations is that a significant part of field generation is carried out by these localized current systems appearing with relatively uniform geometries. It is to be noted that the sheet plumes characterizing flow velocities in a rapidly rotating turbulent regime associated with the above-mentioned current systems appear in other published work on simulations results, such as Schaeffer et al. (2017) or Aubert (2019).

The geodynamo simulations raise the intriguing prospect of producing simplified models based on idealizing the geometries of these concentrated currents. To do so might be beneficial when one would like to gain information on regions inside the geodynamo which can act as sources of the geomagnetic field without any direct need for the computational complexity necessary to conduct a simulation approximating real Earth-like circumstances. This line of thought can only be valid of course provided that the overall structure of the turbulent MHD regime and the associated current density distribution in the simulations indeed resembles that of the geodynamo, and if some practical way of inference can be established using such idealized phenomenological models. In the current study, one possible way of bypassing this second obstacle with the help of ML is presented in detail.

Until now, few studies investigating the internal geomagnetic field have exploited ML methods. For example, the work of (Gwirtz et al. 2022) concerns the predictability of pole reversals using ML only on historical geomagnetic dipole moment (GMD) data. The authors concluded that the task could not be well addressed by the methods they studied due to the small number of relevant data and their low frequency domain resolution. It is worth mentioning Loftin et al. (2019) who analyzed the applicability of ML in the data preparation for geomagnetic models.

In an introductory study Kuslits et al. (2020) gave a more detailed review of the difficulties faced by current research directions not using ML, and proposed a very similar alternative method to the one presented here. Conceptually, the forward model idealized the geodynamo process as a set of many localized individual current loops (or loops for short), and the inversion method heavily relied on ML, in particular, deep learning. However, that study demonstrated the concept in one relatively simple example using only synthetic data with no conductive medium and no time variation introduced in the forward model. It was presented as being an equivalent loop model with an emphasis on being conscious of the much more complicated current systems potentially existing in various spatial scales within a conductive core in the actual geodynamo.

It is to be noted, that reconstructing a complex current density distribution in a large inaccessible volume of space like Earth’s outer core (in which the bulk of the dynamo action occurs) has its specific challenges.

There is a long history of other previous works searching for current loop representations of the geomagnetic field with a similar goal in mind, such as Alldredge (1987), Rong et al. (2021) and references therein and Peddie (1979). All these attempted to fit either one or more magnetic dipoles or current loops to some representation of the Earth’s magnetic field or a dataset that contains measured magnetic field values. Computationally speaking, they generally inverted for loop parameters using some form of least-squares inversion. Although in the case of a single current loop the determination of loop parameters through least-squares inversion yields sound results, such parameters rarely provide insight into the inner workings of the geodynamo or the current systems in the Earth’s outer core. It is possible to attempt to model the Earth’s magnetic field by carrying out least-squares inversion for the parameters of multiple loops. In this case, the sensitivity of least-squares inversion to the initial parameters can be remedied in multiple ways.

In Alldredge (1987) the author first inverted the parameters of two loops in axial position. Then based on the residual magnetic field, judiciously set the initial parameters for 5 additional current loops and carried out the inversion for the newly added loop parameters. These steps were iterated until sufficiently low RMS error was reached. In Peddie (1979). the author ran 20 sets of inversions with randomized initial conditions to account for the sensitivity to initial parameters.

To summarize and expand on the shortcomings of these attempts, the following observations can be made:

  1. 1)

    They applied little to no objectively grounded constraints on the possible characteristic dimensions of the loops or their abundance in the models.

  2. 2)

    They offered no objective methodological possibility to handle the discrepancies between idealized representations, and the potentially much more complex magnetic fields and current systems existing within the geodynamo.

  3. 3)

    The linearized least-squares inversions they mostly applied were sensitive to the initial parametrization of the loops, in particular, to the initial placement of them (Gubbins and Roberts 1987). The solutions offered by different authors relied on either a human expert to determine sound initial parameters or the randomization of the initial parameters. These solutions made the regular and “large-scale” application of such inversion methods untenable (e.g., inverting for different IGRF representations).

  4. 4)

    They did not take into account “secondary” effects when attempting to reconstruct the time-variation of the field as well as its actual state, such as the dampening of the magnetic field (magnetic diffusion) generated by changes of the loop currents in time.

As it was also learned from previous attempts using other types of equivalent source approaches for forward modeling, such as Mayhew and Estes (1983) and Ladynin (2014), these problems are necessary to deal with when one aims at developing a physically relevant reconstruction of the geodynamo using highly simplified sources.

The present work addresses these shortcomings via upgrading the approach proposed in the introductory study of Kuslits et al. (2020). This further developed approach is referred from hereafter as the CLUMSI methodology. Shortcomings were addressed specifically the following ways. Issue (1), by deriving the range of the potential number of loops and loop parameter values. Issues (2) and (3) by training an updated image-processing deep neural network using the DANN method, which then gives an initial estimation for all the loop parameters corresponding to a set of field and SV values over the course of a single evaluation. Issue (4) by establishing a practical approximation based on a systematic series of simple numerical models to consider the effect of electromagnetic induction on time-varying magnetic fields of loop models in a highly conductive medium such as Earth’s core.

Figure 1 presents the flowchart showing an overview of the CLUMSI estimation scheme, components of which are explained mainly in the following two sections.

Fig. 1
figure 1

Flowchart of the entire algorithmic scheme of the estimation process presented in this study

Section “Defining an idealized model and constraining the model parameters” gives the detailed description of the forward model (components [1–7]). Section “Inversion framework” gives a high-level summary of the inversion framework concentrating on the applied deep learning image segmentation algorithm along with introducing the loss functions and quality parameters used during training and evaluation (components [8–12], [16]).

Section “Testing the methodology” summarizes inversion results using noise-free synthetic data and demonstrates a result on input data coming from a real geomagnetic model (components [13–15], [17]). Discussing the pros and cons of CLUMSI and the implications of these results on some ideas concerning further development is featured in Sect. “Discussion”. Conclusions are given in Sect. “Conclusions”. The appendices detail further aspects of the inversion framework, especially concerning the representativeness of training samples. One concrete example demonstrating why the type of problem defined in this study is more difficult to handle using a previously applied leas-squares inversion technique is presented in Appendix G.

Defining an idealized model and constraining the model parameters

To formulate the forward problem in a way which satisfies the desired phenomenological concept drafted in Sect.  “Introduction”, a simplified current system model is needed, in which individual sources can meet the qualitative criteria described below.

The structure and features should roughly correspond to those of the current systems emerging in the Miyagoshi et al. (2008) and Miyagoshi et al. (2011) simulations. They should also have a finite spatial extent, while forming a closed circuit and must be well parametrized. After iterating on possible geometric configurations, filamentary circular currents (loops) were chosen as the simplest possible current systems which can fulfill these requirements.

As mentioned earlier, for the analysis presented here to be more relevant for the actual geodynamo process, the stationary current loop models used in previous studies needed to be updated. To do that, the solution applied in Kuslits et al. (2020) to compute the field of stationary loops and is described in detail in Sect. “Primary field of stationary current loops” was corrected by an estimate of the additional (induced) field produced by a time-variation in the loop currents (see Sect. “Approximating the total field assuming linear time variation in the source currents”).

Altogether each loop in this updated model can be described by 8 source parameters featured in Table 1.

Table 1 Parameters of a single current loop

For the solutions to be eligible to produce a training set representative of historical geomagnetic field data, additional constraints based on first-order considerations about the geomagnetic field were imposed when generating the model samples (see Appendix B).

Primary field of stationary current loops

The radial component of the primary magnetic field of an individual current loop \(i\) is obtained in the Cartesian coordinate system according to the following analytical solution deduced by Simpson et al. (2001).

Provided that the current loop has an axial position in its local coordinate system (see Fig. 2a) and the field is computed at an observation point located at a distance \({\varvec{r}} = \left(d{x}_{i},{dy}_{i},{dz}_{i}\right)=\left(x-{x}_{{0}_{i}},y-{y}_{{0}_{i}},z-{z}_{{0}_{i}}\right)\) from the center of the loop:

Fig. 2
figure 2

(Source of the figure used for the magnetic field lines: (Ling et al. 2016))

Schematics of transforming a given current loop from a source-centered (local, a)) Cartesian coordinate system to an Earth-centered (global, b)) Cartesian coordinate system in the models. θ and λ are the angles of attack in transformations (7, 8) and define the attitude of a given loop.

$${B}_{{x}_{i}}\left({\varvec{r}}\right)=\frac{C{I}_{i}{x}_{i}{z}_{i}}{2{{\alpha }_{i}}^{2}{\beta }_{i}{{\rho }_{i}}^{2}}\left[\left({R}_{i}^{2}+{r}_{i}^{2}\right)E\left({k}_{i}^{2}\right)-{\alpha }_{i}^{2}K\left({k}_{i}^{2}\right)\right],$$
(2)
$${B}_{{y}_{i}}\left({\varvec{r}}\right)=\frac{{dy}_{i}}{d{x}_{i}}{B}_{{x}_{i}},$$
(3)
$${B}_{{z}_{i}}\left({\varvec{r}}\right)=\frac{C{I}_{i}}{2{\alpha }_{i}^{2}{\beta }_{i}}\left[\left({R}_{i}^{2}-{r}_{i}^{2}\right)E\left({k}_{i}^{2}\right)+{\alpha }_{i}^{2}K\left({k}_{i}^{2}\right)\right].$$
(4)

In the above expressions, \({\uprho }_{i}^{2}={dx}_{i}^{2}+{dy}_{i}^{2}\), \({r}_{i}^{2}=d{x}_{i}^{2}+{dy}_{i}^{2}+{dz}_{i}^{2}\), \(C{I}_{i}=\frac{{\mu }_{0}{I}_{i}}{4\uppi }\), \({\alpha }_{i}^{2}={R}_{i}^{2}+{r}_{i}^{2}-2{R}_{i}{\rho }_{i}\), \({\beta }_{i}^{2}={R}_{i}^{2}+{r}_{i}^{2}+2{R}_{i}{\rho }_{i}\) and \({k}_{i}^{2}=1-\frac{{\alpha }_{i}^{2}}{{\beta }_{i}^{2}}\). \(E\left({k}_{i}^{2}\right)\) and \(K\left({k}_{i}^{2}\right)\) are elliptic integrals of the first and second kind, respectively.

The final result for the radial magnetic field was calculated for the CMB as reference surface using a Mercator projection and it was obtained with the help of the following series of spatial rotations.

Positional rotation of the source vector field by a declination angle \({\theta }_{i}\):

$${{\varvec{R}}}_{{\theta }_{i}}=\left[\begin{array}{ccc}{\text{cos}}({\theta }_{i})& 0& {\text{sin}}({\theta }_{i})\\ 0& 1& 0\\ -{\text{sin}}({\theta }_{i})& 0& {\text{cos}}({\theta }_{i})\end{array}\right].$$
(5)

Positional rotation of the source field by an azimuth angle \({\lambda }_{i}:\)

$${{\varvec{R}}}_{{\lambda }_{i}}=\left[\begin{array}{ccc}{\text{cos}}({\lambda }_{i})& -{\text{sin}}({\lambda }_{i})& 0\\ {\text{sin}}({\lambda }_{i})& {\text{cos}}({\lambda }_{i})& 0\\ 0& 0& 1\end{array}\right].$$
(6)

Using matrices (5) and (6), one can determine the spatial coordinates of the magnetic induction vector field of each \(i\) unique source:

$$\left[{x}_{i}, {y}_{i},{z}_{i}\right]={\left({{\varvec{R}}}_{{\lambda }_{i}}{{\varvec{R}}}_{{\theta }_{i}}\right)}^{-1}\left[\begin{array}{c}{x}_{i}^{\prime}\\ {y}_{i}^{\prime}\\ {z}_{i}^{\prime}\end{array}\right].$$
(7)

The same transformation must be performed for the spatial attitudes of induction vector fields to obtain magnetic fields corresponding the spatial orientation of each source at each point on the CMB surface:

$$\left[{B}_{{x}_{i}},{B}_{{y}_{i}},{B}_{{z}_{i}}\right]={\left({{\varvec{R}}}_{{\lambda }_{i}}{{\varvec{R}}}_{{\theta }_{i}}\right)}^{-1}\left[\begin{array}{c}{B}_{{x}_{i}}^{\prime}\\ {B}_{{y}_{i}}^{\prime}\\ {B}_{{z}_{i}}^{\prime}\end{array}\right].$$
(8)

The radial component of the resulting magnetic field in spherical coordinate system was then obtained from the Cartesian components at each (CMB) surface point:

$${B}_{{r}_{{\text{prim}}}}^{i}\left(r,\phi , \Lambda \right)={B}_{{x}_{i}}{\text{sin}}\left(\phi \right){\text{cos}}\left(\Lambda \right)+{B}_{{y}_{i}}{\text{sin}}\left(\phi \right){\text{sin}}\left(\Lambda \right)+{B}_{{z}_{i}}{\text{cos}}\left(\phi \right).$$
(9)

The solution for the primary radial field in the forward problem uses the summation of radial field component data at the CMB for all the \(N\) sources introduced in a particular model instance. As a result, Mercator maps (such as the one in Fig. 3) of the primary magnetic field of the entire model on the CMB were obtained for each model. Due to limitations in computational capacity, an angular resolution of 2° in longitude and latitude was set when producing the resulting maps:

Fig. 3
figure 3

Primary radial magnetic field of a single, radially aligned source (\(\phi =95^\circ , \Lambda =165^\circ ,r=3e6,R=8.4e5,I=8.4e8,\theta =95^\circ , \lambda =165^\circ\)) computed for a grid with a resolution of 2° (90 by 180 gridpoints) on the reference surface of the models representing the CMB

$${B}_{{r}_{{\text{prim}}}}\left(r={r}_{{\text{CMB}}},\phi , \Lambda \right)={\sum }_{i=1}^{N}{B}_{{r}_{{\text{prim}}}}^{i}\left(r={r}_{{\text{CMB}}},\phi ,\Lambda \right).$$
(10)

Approximating the total field assuming linear time variation in the source currents

Due to the high estimated values for core conductivity (Ohta et al. 2016; Dongxiao et al. 2019), the effect of magnetic diffusion cannot be neglected when approximating the total magnetic fields of the source models.

The conductive material of the core and the strongly time-dependent MHD processes necessitated the introduction of a model encompassing time-varying sources. Such a model also considers the magnetic field of currents induced in a highly conductive medium and is thus much more relevant for the geodynamo than applying solely the solution discussed in Sect. “Primary field of stationary current loops” which assumes only steady-state loops.

However, to construct this model in a consistent way while still being able to produce sets of solutions quickly enough to be able to apply ML in the inverse problem mounted a significant challenge.

To generate sufficient training data, a systematic series of simulations of the induced fields around individual loops applying different source parameters and current variations was built up. Electromagnetic induction was modeled by a spherical domain representing Earth's core in the simulations. As the exact value of electrical conductivity in the core is relatively poorly constrained, the authors chose a conservative value of \(\sigma =5*{10}^{5}[{\text{S}}/{\text{m}}]\) (this value was selected based mainly on the experimental work of Ohta et al. (2016). In each simulation the \({{\varvec{B}}}_{{\text{prim}}}\) primary field and the \({{\varvec{B}}}_{{\text{tot}}}\) total field of the source were computed separately. The induction effect was given by the difference between these fields (\({{\varvec{B}}}_{{\text{tot}}}-{{\varvec{B}}}_{{\text{prim}}}\)). These simulations used a finite element numerical framework (see Fig. 4) utilizing the COMSOL Multiphysics 5.3a software package (Multiphysics 1998).

Fig. 4
figure 4

Computational setup of a single simulation member in the series showing the finite element mesh (a) and the resulting induced field around the loop in the rotationally symmetric spherical domain (b)

To be able to later produce enough training data for the ML-based parameter inference, the following simplifying assumptions need to be made:

  1. 1)

    The location of the current loops is fixed, and two loops cannot fall directly under each other in the radial direction. The former of these two conditions meant that one did not have to account for induction coming from the motion of the sources in the conducting medium. The latter constraint was intended to prevent sources that are geographically very close to each other which leads to problems of equivalence (see Sect. “Results using noise-free synthetic data”).

  2. 2)

    The position of the sources is set, so that the axes are radial (that is for all \(i\), \({\Phi }_{i}={\theta }_{i};{\Lambda }_{i}={\uplambda }_{i}\)).

  3. 3)

    Only the current intensity carried by the sources varies, and it varies linearly in time.

  4. 4)

    The inner and outer core of the Earth were modeled as one uniformly conducting sphere. A full description of this approach is given by Metman et al. (2019).

Conditions (1)–(2) allowed us to perform a series of simulations in an axisymmetric 2D domain. This procedure is also known as 2.5D simulation, since it is possible to perform 3D modeling by exploiting the symmetry property of a 2D geometric layout (Jacobs et al. 2007).

Condition (4) simplified the setup and computational complexity of the simulations.

The application of condition (3) also had a beneficial consequence in terms of computational complexity. This meant that the solution of a complex frequency domain induction problem calculating the field of a source emitting harmonic signals (Weaver 1994) could be avoided. As a result, induced fields independent of time and the magnitude of the primary source could be obtained. As the transients decayed, the magnitude of induced fields depended linearly on the rate of change in the source currents. The current intensity in the models can be described by the form shown below:

$${\varvec{I}}={{\varvec{I}}}_{{\text{prim}}}+{{\varvec{I}}}_{{\text{ind}}}={\varvec{C}}({\varvec{r}})t+{{\varvec{I}}}_{{\text{ind}}}\left({\varvec{r}},t\right).$$
(11)

In the above equation, the primary current \({{\varvec{I}}}_{{\text{prim}}}\) comes from a current loop and changes linearly in time at a given rate \({\varvec{C}}({\varvec{r}})\). This constraint on the current results in a linearly changing magnetic field in time around the source, decreasing with distance \(r\) from the source depending on the electrical conductivity of the surrounding medium. As the current loop is embedded within a conductive medium, representing the Earth’s core, the time-variation of the magnetic field induces a complex current density field around it. Induced currents \({{\varvec{I}}}_{{\text{ind}}}\left({\varvec{r}},t\right)\) have a direction at each point in any given time which decreases the change in the magnetic flux responsible for their creation. This results in a screening effect by the conductive core depending on the shape and size of the conductive domain between the source and each point of observation. Following the decay of transients, static-induced fields proportional only to \({\varvec{C}}({\varvec{r}})\) are formed.

This equilibrium state thus sets in after some delay in simulation time with an increasing spherical distance from the source on the CMB. In our series of simulations, following a delay time of \({{t}_{d}=10}^{6}\left[yr\right]\), these static-induced fields have built up in all our models on the CMB.

Using these results meant that a polynomial interpolation of the above-described induction effect (Eqs. (1923)) using data points computed in the systematic numerical simulations was satisfactory with respect to generating synthetic training samples within computational (and time) limitations.

The numerical simulations used a low-frequency approximation of Maxwell’s equations in time domain. Separate simulations computing the primary and the total magnetic fields were run in parallel, assuming an insulating and a conductive core, respectively. They were implemented using a spherical axisymmetric geometry introducing current loops with systematically varied parameters described in Table 1. This meant the numerical solution of equations:

$${\varvec{B}}=\nabla \times {\varvec{A}},$$
(12)
$$\nabla \times {\varvec{B}}=\mu {\varvec{J}},$$
(13)
$${\varvec{E}}=-\frac{\partial {\varvec{A}}}{\partial t},$$
(14)
$${\varvec{J}}=\sigma {\varvec{E}}.$$
(15)

For clarity, in Eqs. (12) and (14), \({\varvec{A}}\) denotes the magnetic vector potential and \({\varvec{E}}\) the electric field. To set the dimensions of the simulation domain to be characteristic for those of Earth, the following radii were defined:

$${r}_{{\text{CMB}}}=3.48*{10}^{6}\left[{\text{m}}\right],{r}_{E}=6.38*{10}^{6}\left[{\text{m}}\right],$$
(16)

\({r}_{{\text{CMB}}}\) denoting the radius of the CMB and \({r}_{E}\) that of the Earth.

Current loops are defined as line-currents perpendicular to the plane in which we calculate induced field values in this axisymmetric setting. Since the magnitude of static induced fields depended linearly on the rate of change in the source currents, interpolation can be carried out using only two values of \({(dI/dt)}_{i}\) independently from the current intensities (which were chosen to be zero at the beginning of the simulation time for the sake of simplicity).

Thus, three of the eight parameters listed in Table 1 were enough to obtain a basis set of full spatial solutions using individual current loops parametrized as described in Table 2. These were chosen from the range of potential loop parameters defined in Appendix B.

Table 2 Parameters of simulation members (loops), forming a total set containing 130 base results for the interpolation using all parameter combinations \({r}_{i}, {R}_{i},{\left(\frac{dI}{dt}\right)}_{i} \in [ |r|\times |R|\times |dI/dt|]\)

The boundary condition prescribed the vanishing of tangential magnetic fields, which is not ideal as outside the core poloidal magnetic fields can be expected (Metman et al. 2019). Although for this type of problem, applying

$${\varvec{n}}\times {\varvec{H}}=0$$
(17)

at the surface of the Earth rather than at the CMB (where \({\varvec{n}}\) is the normal vector with respect to Earth’s surface) was acceptable, as solutions in this case agreed to a high degree of accuracy regardless of the specific type of boundary condition used.

The magnetic effect of currents induced in a conductive core was determined only with respect to the radial magnetic component of induction vector fields at the CMB. As the components of the magnetic induction vector were defined in a cylindrical coordinate system, spherical radial components were computed as follows:

$${B}_{{r}_{{\text{ind}}}}^{i}(r,\phi ^{\prime},t)={B}_{{\rho }_{{\text{tot}}}}^{i}(r, \phi ^{\prime},t) {\text{sin}}\left({\phi }^{\prime}\right)+{B}_{{z}_{{\text{tot}}}}^{i}(r, \phi ^{\prime},t) {\text{cos}}\left(\phi \right)-{B}_{{\rho }_{{\text{prim}}}}^{i}(r, \phi ^{\prime},t) sin\left({\phi }^{\prime}\right)-{B}_{{z}_{{\text{prim}}}}^{i}(r, \phi ^{\prime},t) {\text{cos}}(\phi ^{\prime}) | r={r}_{{\text{CMB}}};t={t}_{d}.$$
(18)

The relationship between these radial components and the corresponding loop parameters was then approximated fitting exponential polynomial basis functions using the weighted Ridge regression computation in the scikit-learn library (Pedregosa et al. 2011):

$${B}_{{r}_{{\text{ind}}}}^{i}(r={r}_{{\text{CMB}}}, \phi ^{\prime})\approx { C}_{\mathbf{p}\mathbf{o}\mathbf{l}\mathbf{y}}\left({R}_{i},{r}_{i},\phi ^{\prime}\right)*{\left(dI/dt\right)}_{i},$$
(19)

where

$${C}_{{\text{poly}}}\left({R}_{i},{r}_{i},\phi ^{\prime}\right)={\sum }_{j=1}^{\left(\genfrac{}{}{0pt}{}{n+3}{n}\right)-1}{\beta }_{j}{e}^{{{R}{l}_{1j}}{r}^{{l}_{2j}}{\phi ^{\prime}}^{{l}_{3j}}}.$$
(20)

In Eq. (20) \({l}_{1j}+{l}_{2j}+{l}_{3j}=n {\text{ and }} {l}_{1j},{l}_{2j},{l}_{3j}\ge 0\) and the \({\upbeta }_{j}\) coefficients were determined from

$${\varvec{\beta}}= {\left({{\varvec{A}}}^{{\varvec{T}}}{\varvec{W}}{\varvec{A}}+\chi {\varvec{I}}\right)}^{-1}{{\varvec{A}}}^{{\varvec{T}}}{\varvec{W}}({{\varvec{B}}}_{{\varvec{i}}{\varvec{n}}{\varvec{d}}}-\widehat{{{\varvec{B}}}_{{\varvec{i}}{\varvec{n}}{\varvec{d}}}}),$$
(21)

where \({\varvec{I}}\) is the identity matrix, \(\chi\) is the regularization parameter (see (Hoerl and Kennard 1970)), which was set to \(\chi ={10}^{-7}\). Matrix \(\mathbf{A}\) contains the exponential of the loop- and angular distance parameters for the total number of \(m=\left|R\times r\times \left(\frac{dI}{dt}\right)\times \phi ^{\prime}\right|=65000\) simulated data points used for the approximation:

$${{A}_{m,j}= e}^{{R}_{m}^{{l}_{1j}}{r}_{m}^{{l}_{2j}}{\phi ^{\prime}}_{m}^{{l}_{3j}}},$$
(22)

and \(\mathbf{W}\) weights have the elements,

$${W}_{m,j} =\frac{1}{\sqrt{\left|{B}_{m,{r}_{{\text{ind}}}}^{j}\right|}}.$$
(23)

Using \(n=11\) and the weights as defined in (23), these exponential functions resulted in the best approximation of the original \({{\varvec{B}}}_{{r}_{ind}}^{i}\) with a normalized root mean square error (NRMS, see Sect. “Loss functions and quality metrics”) of \(NRMS=1.4*{10}^{-5}\) for the interpolated values (see the fitted polynomials in Fig. 5). The rationale behind applying approximation (1923) is simply computational feasibility. Running individual simulations directly to compute the composite induced field of a single loop model containing a hundred loops, such as the ones shown in Fig. 6e and f would take approximately 15 min to compute. Using the polynomial formula (1920) reduces the time needed for such a computation to less than 30 s.

Fig. 5
figure 5

Source parameters are noted in the middle of each corresponding graph

Comparison of polynomials fitted for the induction effect on the CMB radial fields around three different example current loops and the actual results from test simulations.

Fig. 6
figure 6

Primary (a), (c), (e) and total (b), d), (f) radial magnetic fields obtained using the approximate solution correcting for inductive screening at the CMB (in [T] units). Maps are shown for models containing, respectively, 1, 25 and 100 radially aligned current loops placed arbitrarily inside the domain representing an insulating (left column) and a conductive (right column) core

Based on Eqs. (910) and (1920), the final result can be obtained using both the primary and the induced magnetic fields for each current loop in an individual model sample. The spatial distribution of the magnetic fields can be computed relatively fast following the transformation of the axisymmetric solutions from the local coordinate systems of the loops to the models’ global spherical coordinate system:

$${B}_{{r}_{{\text{tot}}}}\left(r={r}_{{\text{CMB}}},\phi ,\Lambda \right)={\sum }_{i=1}^{N}\left({B}_{{r}_{{\text{prim}}}}^{i}\left(r={r}_{{\text{CMB}}},\phi ,\Lambda \right)+{B}_{{r}_{{\text{ind}}}}^{i}\left(r={r}_{{\text{CMB}}},{{\phi }_{i}}^{\prime}=a{\text{cos}}\left({\varvec{r}}{\cdot {\varvec{r}}}_{0{\varvec{i}}}/(r\cdot {r}_{0i})\right)\right)\right)={\sum }_{i=1}^{N}{B}_{{r}_{{\text{prim}}}}^{i}\left(r={r}_{{\text{CMB}}},\phi ,\Lambda \right)+{B}_{{r}_{{\text{ind}}}}^{i}\left(r={r}_{{\text{CMB}}},\phi ,\Lambda \right).$$
(24)

In Eq. (24) \({\mathbf{r}}_{0\mathbf{i}}\) points to the center of the \(i\)th loop from that of the Earth, and \(\mathbf{r}\) points to a location on the CMB \(\left(r={r}_{{\text{CMB}}},\phi ,\Lambda \right)\). These solutions yielded modified versions of the Mercator maps coming from solution Eq. (10) corrected for the screening effect of the induction.

The above-detailed concept was efficient in generating a large amount of training data which were representative in terms of the overall distribution and magnitude of the radial geomagnetic field at the CMB. However, it also had an important drawback when it comes to reproducing Earth-like SV values. As an assumption of linear time variation in the source currents results in a constant \({B}_{{r}_{{\text{ind}}}}^{i}\), any time variation in these synthetic magnetic fields is observed in their primary radial components \({B}_{{r}_{{\text{prim}}}}^{i}\). Source positions were fixed in the models so one can estimate what is the highest possible rate of change in the current carried by the largest loop, based on maximum observable values of the actual SV (see in Appendix B). Such a high rate-of-change would in turn produce an induced field so strong it would be an order of magnitude larger than the largest values of the actual radial CMB field recorded in available historical geomagnetic data (see Sect. “Refining the ML-based inversion” and Appendix C). This essentially means that the source model described here was unable to simultaneously reproduce Earth-like field magnitudes and SV. We chose to optimize the models for the former objective, so that training samples could be accepted as it is described in Appendix C.

Figure 6 demonstrates the general range of full model solutions used in the synthetic component of the training set and shows how induction screens the primary magnetic fields in the models. One can observe that even though the loop axes in these models are all radially aligned, more complex fields begin to form when increasing the number of loops in the models with the effect of individual loops becoming progressively more difficult to separate for the human eye. Moreover, even a relatively subtle screening around the sources could also have a rather complex effect on the field maps, when comparing the original maps on the left to the corrected ones on the right-hand side of the figure.

Inversion framework

To come up with potentially physically meaningful results in case of real geomagnetic data, besides the regularization of the forward model presented in Sect.  “Defining an idealized model and constraining the model parameters” and Appendix B, modifications on the previous inversion framework were needed. This affected the machine learning implementation, i.e., the algorithm used for training and the generation of the training dataset. The framework used by CLUMSI is similar to the one used in the introductory study in the sense that a neural network is trained to infer the distributions of source parameters (such as their geographic positions) represented as rectangular maps, from magnetic field maps used as input data. The most significant novel work undertaken is using information from measured geomagnetic data which was incorporated in the training process in such a way that allows the network to handle real data more correctly (via the DANN method).

Section “Refining the ML-based inversion” gives an overview of the applied inversion framework focusing on the deep learning methodology as the backbone of CLUMSI. Section “Loss functions and quality metrics” summarizes the quality measures applied for analyzing the method’s performance. A more detailed review of training and test data are presented in Appendix C. Appendix E describes the Genetic Algorithm (GA) as the last stage of CLUMSI obtaining a final estimation of the loop parameters and the corresponding magnetic fields based on the neural net inference.

Refining the ML-based inversion

To robustly obtain a fully parametrized reconstructed source model, a two-step inversion framework was implemented, supplemented by significant modifications when compared to the original algorithm in the introductory study.

That work suggested training an image segmentation neural network with a UNet architecture, only for detecting the geographic distribution of the sources using the maps of the training set. This was followed by a GA solution which obtained exact value estimates for all the parameters of every loop from the suggested locations.

The current work replaces the original UNet implementation with a UNet + + architecture (Zhou et al. 2018) utilizing more hidden layers and output channels to give inferred images of all the source parameters in the models using their respective distribution maps as target values defined by Eqs. (6063) in Appendix C. A new training method, referred to in Sect. “Introduction” as DANN (Ganin et al. 2016), was also implemented, which allowed for an efficient incorporation of maps coming from real geomagnetic data into the training set. Geomagnetic field model data used during training included radial field and SV maps from the historic data set of the GUFM-1 model spanning a time range from 1600 to 1975 (Jackson et al. 2000, see the data in Appendix C).

The refined training process and neural net architecture are drafted in a similar fashion to the flowchart of Ganin et al. (2016) in Fig. 7.

Fig. 7
figure 7

Schematic diagram of the DANN training process of the refined neural network architecture working with complex source model data and real geomagnetic model data. Here, Conv stands for convolutional and FCR for fully connected layers. L1 denotes the loss function (25) and CE denotes the values of the cross-entropy (26) loss function. α(CE) denotes the weighted gradients of the cross-entropy function

During the DANN training, the image segmentation task is augmented by a decision problem about whether the data are coming from the synthetic or the real component of the training set. This is achieved using a separate neural network called a domain classifier. The classifier is trained on the internal representations calculated in a hidden layer of the UNet++ network. That part of the training process is structured in such a way that the poorer the classifier performs, the better training results are achieved in terms of relying on features in the synthetic model which are similar to those of original geomagnetic data.

Finally, the GA solution was computed using the parameter distribution maps resulting from the UNet++ phase as inputs. The GA solution comprised a real coded, multi-population GA for deriving an optimally fitting parametrized current loop model from the neural net output maps. Figure 8 describes the procedure in which the GA searched only in a subspace of potential parameter values outlined around peaks in the parameter maps produced by the UNet++ evaluation. To pinpoint an estimated source location, these peaks were selected as local maxima on the map in the same fashion as described in (Kuslits et al. 2020). The GA search was then performed within a 2-gridpoint radius around them (in accordance with the constraint defined in Sect. “Approximating the total field assuming linear time variation in the source currents”). The quantities on the right-hand side of the figure, representing the parameter distribution maps, are introduced in detail in Appendix C. A more detailed description of the GA itself is given in Appendix E.

Fig. 8
figure 8

Schematic diagram illustrating the final search process of the GA by 'scanning' the distribution functions estimated by the neural network. The black filled circles represent the selected initial source positions and the green squares the search space defined around them. The bottom two maps are the estimation results of the complex source model for Br and SV resulting from a given 'scan'

This seemingly complex solution described above not only improved the results when using synthetic data, compared to the first experimentation showed in the introductory study, but also produced more relevant and well-defined solutions in case of real geomagnetic field data.

Figure 9 demonstrates the difference made by the DANN training and the modified UNet++ configuration on the resulting geographic distribution maps of source positions. It shows that when the previous and the new DANN trained networks are evaluated on the same input maps, the outputs significantly differ in definitiveness. This means not just a sharper outline of reconstructed probable source areas but a more precise solution quantifiable in case of synthetic input data as shown in Fig. 10 in Sect. “Results using noise-free synthetic data”.

Fig. 9
figure 9

Conventional (a) and DANN trained (b) neural network evaluation on real geomagnetic data (normalized inferred dI/dt source parameter distribution projected onto the CMB, calculated from GUFM-1 model data for year 1600)

Fig. 10
figure 10

RPE errors (a) and NRMS misfits (b) of individual final reconstructions carried out using the original and the refined (CLUMSI) algorithm against an increasing number of loops in the test data set

Loss functions and quality metrics

When training UNet +  + for the image segmentation, the L1 loss function was applied to all the channels providing the expected outputs to every source parameter distribution:

$$L1 = \sum_{j=1}^{{n}_{{\text{data}}}}\left|{P}_{{k}_{j}}\left(\phi ,\Lambda \right)-\widehat{{{P}_{k}}_{j}\left(\phi ,\Lambda \right)}\right|,$$
(25)

where \({n}_{{\text{data}}}\) denotes the number of data points in a given sample, \({P}_{{k}_{j}}\left(\phi ,\Lambda \right)\) and \(\widehat{{{P}_{k}}_{j}\left(\phi ,\Lambda \right)}\) are given values of each data point \(j\) on maps of target and estimated source parameters, respectively, defined in Appendix C.

For training the domain classifier to determine the origin of each model sample \(l\), we used the cross-entropy loss as defined below:

$${\text{CE}}=-{\sum }_{l=1}^{{n}_{{\text{samples}}}}{V}_{l}{\text{log}}\left({p}_{l}\right),$$
(26)

where \({n}_{{\text{samples}}}\) denotes the number of samples within a given part of the set of maps used for training, \({V}_{{\text{l}}}\) is the class indicator and \({p}_{l}\) is the estimated probability for each sample \(l\).

As it was established in Sect. “Approximating the total field assuming linear time variation in the source currents” we chose to optimize for fitting to the field values rather than to SV values. This meant quantifying the misfit between the actual radial magnetic field and SV maps and those computed using the full parameter estimation derived from the GA search as follows:

$$\overline{{\text{NRMS}}}=\frac{\sqrt{{{(n}_{{\text{data}}})}^{-1}\sum_{j=1}^{{n}_{{\text{data}}}}{\left({B}_{{r}_{j}}-\widehat{{B}_{{r}_{j}}}\right)}^{2}}}{{\text{max}}\left({B}_{r}\right)-{\text{min}}\left({B}_{r}\right)}+0.1\frac{\sqrt{{{(n}_{{\text{data}}})}^{-1}\sum_{j=1}^{{n}_{{\text{data}}}}{\left(S{V}_{j}-\widehat{S{V}_{j}}\right)}^{2}}}{{\text{max}}\left({\text{SV}}\right)-{\text{min}}\left({\text{SV}}\right)}$$
(27)

Similarly, the mean absolute error (MAE) was also given to show the misfits’ absolute value:

$${\text{MAE}}={{n}_{{\text{data}}}}^{-1}\sum_{j=1}^{{n}_{{\text{data}}}}\left|{B}_{{r}_{j}}-\widehat{{B}_{{r}_{j}}}\right|.$$
(28)

Relative parameter error (RPE) was used to describe the quality of the full parameter reconstruction resulting from the GA search by comparing the true loop parameters with those of the physically closest estimated loops:

$${\text{RPE}}=\frac{\left(\sum_{i,k}\frac{\left|{\widehat{{\text{Par}}}}_{l}^{k}\left(\left|{\widehat{{\varvec{r}}}}_{l}-{{\varvec{r}}}_{g{t}_{i}}\right|={\text{min}}\right)-{{\text{Par}}}_{{gt}_{i}}^{k}\right|}{\left|{{\text{Par}}}_{{gt}_{i}}^{k}\right|}\right)}{8L},$$
(29)

where \({{\text{Par}}}_{{{\text{gt}}}_{i}}^{k}\) is the ground truth value of a single parameter \(k\) of a loop \(i\) in a vector \(\mathbf{P}\mathbf{a}\mathbf{r}\) containing each current loop parameters (see Table 1), \({\widehat{{\text{Par}}}}_{{\text{l}}}^{k}\) is its value estimate provided by the GA via an estimated loop \(l\), and \(L\) is the number of loops in a given model.

To describe specifically the errors made by UNet++ during the segmentation phase, two-dimensional cross-correlation coefficients (CCC) were calculated between the network-generated and the expected output distribution maps for each parameter:

$${\text{CCC}}\left({\widehat{{\varvec{P}}}}_{k},{{\varvec{P}}}_{k}\right)=\frac{{\sum }_{{\phi }_{j}}{\sum }_{{\Lambda }_{j}}({\widehat{P}}_{{k}_{{\phi }_{j}{\Lambda }_{j}}}-\overline{\widehat{{P }_{k}}})({P}_{{k}_{{\phi }_{j}{\Lambda }_{j}}}-{\overline{P} }_{k})}{\sqrt{({\sum }_{{\phi }_{j}}{\sum }_{{\Lambda }_{j} }({\widehat{P}}_{{k}_{{\phi }_{j}{\Lambda }_{j}}}-\overline{{\widehat{P} }_{k}}{)}^{2})({\sum }_{i}{\sum }_{j}({P}_{{k}_{{\phi }_{j}{\Lambda }_{j}}}-{\overline{P} }_{k}{)}^{2})}} .$$
(30)

In Eq. (30), \({\phi }_{j}\) and \({\Lambda }_{j}\) correspond to the latitudinal and longitudinal coordinate points of input and output maps \({P}_{k}\left(\phi ,\Lambda \right)\) and \({\widehat{P}}_{k}\left(\phi ,\Lambda \right)\) of each current loop parameter \(k\).

Testing the methodology

Results using noise-free synthetic data

A relative freedom in the spatial arrangement of the sources which is the result of the unconstrained nature of the problem presented a significant difficulty to the estimation. A growth in reconstruction error with the increasing number of current loops in the models was unavoidable. However, the refined neural net architecture and training method presented in Sect. “Refining the ML-based inversion” proved to be much more efficient at handling this challenge when compared to classical methods (see Appendix G) and the method presented in the introductory study. This quantitatively means that reconstruction errors started to grow significantly above a much higher number of sources in the models than in case of the original algorithm or using a more traditional least-squares inversion.

This trend can be observed in the graphs of Fig. 10 and for the NRMS and RPE metrics defined in Sect. “Loss functions and quality metrics”. It is to be noted that these values were only derived for individual samples as it required a full computation including the GA estimation to be performed, which took considerable computational time.

As the new algorithm produced maps of reconstructed distributions for all the source parameters, cross-correlations (CCC calculated using (30)) between the true maps and neural net outputs were computed for subsets of multiple test samples containing the same number of current loops in the test set (see Table 8). These resulting cross-correlations and their average values were also plotted against an increasing number of loops in Fig. 11.

Fig. 11
figure 11

CCC values between the reconstructed and true parameter maps plotted against an increasing number of sources in the corresponding models. These results were obtained using only the refined algorithm. Markers were assigned to the averages as shown on the label, the bars denote the min–max range of CCC values over a given subset. CCC values of the specific model featured also in Table 3 (being nearly the same as their corresponding subset averages) are referred to by the blue asterisk

Table 3 Misfit values of the reconstructed field compared to characteristic values of the corresponding geomagnetic field and SV magnitudes and the misfit values of a synthetic example for reference

Perhaps the most apparent feature of the inference seen in Fig. 11 is still the decrease of reconstruction accuracy against an increasing number of current loops in the model. Another noticeable tendency when looking at this graph is that the recovery of the rates of change in the loop currents presents a significantly more difficult task for the network than that of other source parameter distributions. The reason for that probably lies in the linear approximation of this parameter and its relatively complex relation to the finally formed magnetic fields.

In all the above graphs (Figs. 10 and 11) one can see the general trend, that errors start to grow significantly when over ~ 80 sources are admitted to the models, provided that the refined machine learning framework is applied. This indicates a higher efficiency than that of the first framework, which already produced results with significant reconstruction error when applied for models containing only 15–20 sources.

The decrease in inference accuracy established itself as the more densely the sources were packed in the model domain, the harder it became even for the image-processing neural network to separate them based on their combined magnetic fields. This phenomenon is illustrated via a comparison in Fig. 12, where the equivalence problem mentioned above is present in the outlined areas.

Fig. 12
figure 12

Maps of true loop positions (a) and the positions estimated using the refined algorithm (b) for a model with 90 loops. One region is highlighted with a red circle where it is clearly visible that the neural network had difficulty in separating the magnetic signal of the loops

Broadly speaking, the test results suggest that reconstructions reproduce the general characteristics of regions with multiple sources in close proximity more reliably than the data coming from individual sources. Figure 12 suggests that overall characteristic features of the distribution are still recovered relatively well, even when the model contains a number of loops higher than 80. This is important to note because the model inferred in case of real geomagnetic field data (Sect. “Results on real geomagnetic model data”) contained 116 current loops.

Graphs in Fig. 10a and b also show that while the new algorithm significantly outperforms the previous one in terms of inference error (RPE), it is still unable to do so when reconstructing the input magnetic field and SV maps (NRMS). To understand the reason behind this discrepancy, one needs to look at individual inversion results.

Figures 13 and 14 demonstrate an inference and field reconstruction of a synthetic model containing 30 current loops. It shows the distribution maps of the model parameters inferred with the help of the UNet++ algorithm along with maps of the radial field and SV values recovered via the final GA solution (see Sect. “Refining the ML-based inversion”). True maps are featured next to each corresponding reconstructed map for comparison.

Fig. 13
figure 13

True (left column) and reconstructed (right column) maps of the corresponding model parameters (60), (63) presented in Appendix C. Gross misidentifications are marked by the outlined areas

Fig. 14
figure 14

Contour maps of the true and reconstructed magnetic fields and SV values in case of the same synthetic model as shown on Fig. 13

When observing the reconstructed field maps in Fig. 14, one can notice that reconstruction quality is compromised by occasional outliers found mistakenly by the image-processing network. This kind of error is already somewhat observable on the ranges of CCC values in Fig. 11 as these mistakes can occasionally reduce the CCC of individual reconstructions thereby increasing the uncertainty of evaluations. Such erroneous estimations are highlighted in Fig. 13 showing the reconstructed loop parameter maps. It is a known general problem of applying deep learning algorithms for image segmentation, that though they can provide a fairly accurate overall inference, on some occasions they can produce very strong yet very wrong output signals (Popescu et al. 2021).

As the authors chose NRMS misfits of the SV values to account for only 10% in the total reconstruction loss (27) during the GA estimation process, unsurprisingly, the quality of the reconstruction of the radial magnetic induction is significantly better than that of the radial SV (see the maps and scales in Fig. 14c, d)).

As for the stability of the inference with respect to potential noise in the input data, a first-order analysis is shown in Appendix F.

Results on real geomagnetic model data

In order to demonstrate the functionality of CLUMSI on actual geomagnetic field model data, in one reconstruction process, maps of the actual CMB radial magnetic induction field and SV at the 2019 epoch of the COVOBS geomagnetic field model (Gillet et al. 2019) were used as inputs. COVOBS data proved to be viable for a demonstration as they contain more relevant recent information on the field, and they were relatively easy to use for constructing an input map (see Appendix D).

Figures 15 and 16 present the real and reconstructed geomagnetic field and SV maps. Based on the method described in Sect.  “Results using noise-free synthetic data”, 116 current loops could be identified using the neural net output image. These are displayed as black dots on the map of the radial field. In addition, map c in Fig. 16 illustrates the relative importance of magnetic diffusion estimated by the reconstruction as the percentage ratio of the induced contribution and the total radial field in the GUFM-1 model (31):

Fig. 15
figure 15

True (a) and reconstructed (b) radial magnetic field values estimated using the 2019 epoch of the COVOBS geomagnetic field model at the CMB. The grey dots on map a indicate the source positions estimated using the network

Fig. 16
figure 16

True (a) and reconstructed (b) radial SV values estimated using the 2019 epoch of the COVOBS geomagnetic field model at the CMB. The relative importance of magnetic diffusion (c) in the reconstruction was illustrated using (31)

$${W}_{{\text{diff}}}=\frac{\widehat{{B}_{{r}_{{\text{tot}}}}}-\widehat{{B}_{{r}_{{\text{prim}}}}}}{{B}_{{r}_{{\text{tot}}}}}.$$
(31)

The overall quality of this reconstruction is similar to that of the synthetic results presented in Sect. “Results using noise-free synthetic data” in the sense that the radial induction vector field can be recovered with much higher accuracy than the radial SV and that occasional outliers in the segmented parameter maps significantly reduce the conformity between true and estimated fields. In general, however, it is ascertainable that the recovered current loop model does reproduce most of the main morphological features of the CMB radial field (and the SV as well for that matter, albeit much less accurately) provided by the geomagnetic model.

Figure 17 shows parameter maps of the model reconstructed using the input maps in Figs. 15 and 16. It is worthwhile to mention that interestingly, many of the reconstructed current loops seem to be distributed in chain-like arrangements, often aligned nearly meridionally (such occurrences are highlighted by dashed lines on the map of Fig. 17a)). It is however out of the scope of the present study to assess whether this can be linked to some property of the actual geodynamo or is an artifact of some sort. It is also noticeable that somewhat unexpectedly, the time variation of loop currents tends to show a more pronounced hemispherical dichotomy than the loop currents themselves (though we expect a much more uncertain recovery for these parameters, as it was established in Sect. “Results using noise-free synthetic data”).

Fig. 17
figure 17

Reconstructed maps of current loop parameters (6063) segmented using actual geomagnetic field model data (COVOBS, epoch 2019)

Table 3 summarizes characteristic quality- and misfit parameters of the reconstructed field and the actual input fields. This too shows that the recovery of radial SV values and their distribution yielded a much poorer result than that of the actual radial field, manifesting most notably in a large difference between the corresponding CCC values.

Discussion

To conclude this paper, it is important to underscore that the CLUMSI methodology presented in this study is still by no means perfect and comes with its own drawbacks.

On one hand, the most apparent limitation of the still highly idealized loop model aiming to represent the current density distribution in the core is that it is unable to account for realistic SV magnitudes.

On the other hand, the most significant problem of the inversion framework comes from outliers in the neural net inference (Sect. “Results using noise-free synthetic data”) which in turn reduces the overall accuracy of reconstructions. Further complicating this issue is that regularizations, such as physical constraints about inferred local field magnitudes or loop parameters, are not yet possible to directly integrate into the neural net training.

It is also important to note that the rectangular maps CLUMSI currently needs to rely on because of the type of neural network (UNet + +) involved, cannot generate a truly uniform random distribution of loop currents on spherical surfaces. This can also introduce projection distortions when fitting a loop model to the observed field values.

These drawbacks can nevertheless be offset by the fact that CLUMSI can recover complex loop models more effectively (with a single evaluation and with higher accuracy) than previous techniques. Also, by the prospect that the approach has a very wide scope for further modifications and improvements with plenty of opportunities to enhance its physical authenticity and accuracy. Such ideas for further development are described below.

The reason for the loop models’ inability to produce realistic SV is the lack of spatial movement of the features in the modeled fields. This issue can be resolved in the future by getting rid of the part of condition (1) (Sect. “Approximating the total field assuming linear time variation in the source currents”), which imposes spatially fixed loops in the forward model. Possible SV values such forward models could produce would then be orders of magnitude higher at each surface location, than those which are currently achievable. Implementing this would however necessitate considering the effect of electromagnetic induction on the field of a moving current system. One possible way to computationally reproduce the effect could be the use of simplified 3D forward model samples applying a moving mesh finite element framework configuration. This additional complexity could be a reasonable extension of our existing loop model because it is known that a significant part of the locally registered SV comes from westward drifting features in the geomagnetic non-dipole field, which can move as fast as 0.5° in geographic longitude per year.

One avenue for alleviating the problem of outliers in the neural net inference could be to indirectly constrain and regularize the training and evaluation process via a custom loss function (see e.g., (Basir and Senocak 2022)). For that, the relation of the CLUMSI methodology to physics informed neural networks (PINN) and to inverse PINNs especially (Raissi et al. 2019; Jarolim et al. 2023) need to be explored. A larger training data set including more synthetic and measured samples as well could also help in obtaining more accurate inferences. Concerning measured geomagnetic data, it is particularly important to note that the GUFM-1 model utilized in the training data set of this study makes use of historical observatory and maritime measurement data taken only at Earth’s surface (Jackson et al. 2000). These could be complemented by more recent geomagnetic model data, such as the COVOBS (Gillet et al. 2019) data, which are based on satellite measurements as well as surface observations, and could provide independent training data at multiple reference surfaces. This would potentially further improve the representativeness of the reconstructed source model with respect to the actual physical state of the geodynamo (Alken et al. 2021).

Current loop models inferred in this study tend to be local compared to the scale of the core. It remains an open issue if we can relate them to larger-scale current systems in the geodynamo and how this could be achieved. Further comparisons to MHD-based methodologies need to be conducted. For example, it could prove to be useful to compare snapshots of inferred loop parameter distributions, such as the ones presented in Figs. 15a and 17 taken using different epochs of an input geomagnetic model and compare them with maps of reconstructed core flows considered to dominate the intermittent time period.

It is also noteworthy that some derived parameter distributions in direct numerical simulations, like the “dynamo generation term” introduced by Miyagoshi et al. (2011) may display sufficient spatial stability and concentration while showing enough variability as well to be suitable for a similar kind of inference on real geomagnetic data.

Perhaps the future viability of CLUMSI (and similar attempts) as a supplementary tool for validating efforts trying to picture the internal dynamics of the geodynamo depends on whether utilizing a composite of a high number of relatively simplistic sources can correctly represent such complex physical processes in general.

Conclusions

This study demonstrates the potential effectiveness of using deep learning for recovering highly complex current density distributions such as the one expected to be responsible for generating the magnetic field in the geodynamo. Synthetic tests confirmed that an image-processing neural network can recover complex distributions of source currents from input magnetic field data with reasonable accuracy. Main morphological features of the actual geomagnetic radial field could also be reproduced, albeit this was still based on a highly idealized equivalent current loop model and suffers from some significant drawbacks. Most notably, the inability to produce reconstructions that are representative of both the geomagnetic field and its SV, and occasional outliers in the inference which significantly affect the reconstruction quality.

Availability of data and materials

Codes are available at https://github.com/lukacskuslits/kuslits-et-al-current-loops. Test data, network weights and figures are available at https://github.com/lukacskuslits/kuslits-et-al-current-loops/blob/main/current-loops-data.zip. GUFM-1 data are available from https://github.com/martinrehfeld/gufm1-webservice COVOBS data are available from https://geodyn.univ-grenoble-alpes.fr/.

Abbreviations

CLUMSI:

Current loop-based model segmentation inference

CMB:

Core–mantle boundary

GA:

Genetic algorithm

GMD:

Geomagnetic dipole moment

ML:

Machine learning

MHD:

Magneto hydrodynamics

SH:

Spherical harmonics

SHC:

Spherical harmonic coefficients

SV:

Secular variation

References

Download references

Acknowledgements

We wish to thank Hannah Rodgers (ISTerre) for helping us in efficiently utilizing COVOBS geomagnetic model data and for her suggestions on the manuscript suggestions. We also wish to thank Antal Jakovác (ELTE Institute of Physics and Astronomy), Csaba Szabó (ELTE Institute of Geography and Earth Sciences), László Balázs (Wigner Research Centre for Physics) and Péter Pollner (ELTE Institute of Physics and Astronomy) for their insightful suggestions. Gábor Hatos is gratefully acknowledged for his prompt availability concerning computational infrastructure at HUN-REN EPSS. We say our thanks to László Szarka, the former head of HUN-REN Research Centre for Astronomy and Earth Sciences for his continuous support concerning this research.

Funding

Open access funding provided by HUN-REN Institute of Earth Physics and Space Science. This work was supported by the "Machine Learning Methods in Earth Physics" project implemented at the "Institute of Earth Physics and Space Science" under grant SA 46/ 2021 of the Hungarian Research Network (HUN-REN).

Author information

Authors and Affiliations

Authors

Contributions

LK invented the concept of loop model solutions, computed the solutions, proposed the application of neural networks and the DANN method, generated the training data along with developing the related computations (in MATLAB), performed the full parameter GA estimations (also in MATLAB) and wrote the bulk of the manuscript. AH constructed the neural network framework, developed the corresponding (python) code base, performed its training and the inferences. VW contributed to the Introduction section via summarizing previous attempts at constructing source models of the geomagnetic field. CB mostly to the Introduction section via summarizing core surface flow estimations and inverse geodynamo models and the Discussion section regarding possible future comparisons between snapshots of inferred loop distributions and core surface flow estimations. TR designed the numerical simulations in COMSOL. EP contributed mostly to the reimplementation of Alldredge’s method and the corresponding non-linear least-squares estimation and the computational framework for the GA estimations along with the methodology for simulating the induction screening around the loops. LCz established the calculations deriving potential loop parameter values and contributed to finalizing the formatting of the manuscript. IB contributed mostly to the methodology for simulating the induction screening around the loops along with writing the bulk of the overview of current loop representations in the literature. IL invented the computational framework simulating the induction screening around the loops and established the overall algorithmic logic of that framework. All co-authors read and approved the final manuscript.

Corresponding author

Correspondence to Lukács Kuslits.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Validation of the numerical model and detailed description of the computational framework

As a simple guiding value for setting the computational mesh size, the Courant criterion used in finite difference simulations (Dutykh 2016) was applied. This gives the following relation between time steps Δt and spatial discretization Δx for the classical magnetic diffusion problem:

$$\Delta t \le \sigma \mu \Delta {x}^{2}/2.$$
(32)

Equation (32) for a minimum time step \(\Delta t=3.15*{10}^{7} [{\text{s}}]\) (1 year) yields a minimum finite element size of \(\Delta x=3200 [{\text{m}}]\).

Equations (1215) were solved for the vector potential \({\varvec{A}}\) with an initial value:

$${\varvec{A}}=0.$$
(33)

Equation (33) applied for the entire domain as for estimating the induction screening discussed in Sect. “Approximating the total field assuming linear time variation in the source currents”, the current intensity could be linearly increased in time from \({I}_{{t}_{0}}=0 [A]\) at a rate given in Table 2.

The full solution used the MUMPS solver (MUltifrontal Massively Parallel Solver 2022) on the linearized finite element discretization of Eqs. (1215) using initial condition (33).

To validate whether the numerical models give a correct approximation of the induced magnetic fields, qualitative comparisons can be made for models which assume a conductive core. A process similar in nature to that described in Sect. “Approximating the total field assuming linear time variation in the source currents” results from the theoretical problem discussed in Weaver (1994), in which the cessation of the source current variation results in the restitution of the static primary field after the transients have ceased.

When assuming an insulating core, the fields obtained in the simulations can be checked by comparing them against the analytical solution presented in Sect. “Primary field of stationary current loops” on reference surfaces picked at different radial distances from the CMB. Comparisons of the solutions can be made for the magnetic field of a single source. In Fig. 18 resulting radial fields are plotted in the reference frame of one such current loop.

Fig. 18
figure 18

Validating the numerical simulation against the analytical solution assuming an insulating core. The numbers above subplots a, b and c describe the source parameters and the radial position of the reference surfaces \({H}_{{\text{ref}}}\) with respect to the CMB (\({R}_{{\text{e}}}\) in graph a) is Earth’s radius)

Figure 18 confirms that the two solutions, calculated on meridional circles with an increasing radius, agree to a high degree of accuracy in the order of magnitude range of the source parameters used for the training set.

Appendix B: Defining the range of model parameter values in the training set

The computation of synthetic training samples was performed using randomly generated values of the source parameters (see Table 1). To determine the range of possible magnitudes for these values, simple considerations concerning the physical dimensions of the core and the geomagnetic field were taken into account.

The CMB and the ICB arise as natural constraints on the spatial location of the reconstructed current loops. However, due to the high electrical conductivity of the core, the lower depth limit can be placed much higher. Let us imagine placing time-varying but spatially confined magnetic fields in a conductive core. In this case, a \({\delta }_{z}\) skin (or attenuation) depth can be given because of the electromagnetic screening of the conducting material above them (Gubbins 1996):

$${\delta }_{z}={\left(\frac{{\tau }_{P}^{{l}_{j}}}{{\mu }_{0}\sigma }\right)}^{1/2}.$$
(34)

Equation (34) expresses this screening in spherical harmonic functions, where \({\tau }_{P}^{{l}_{j}}\) is the diffusion time associated with poloidal magnetic fields of degree \(l\) and order \(j\). It is reasonable to assume that from time-varying local sources, most of the screened part of their magnetic fields (required for the separation of their signal) is essentially in the non-dipole SH spectrum. Approximating the core conductivity by \(\sigma =5*{10}^{5}\left[{\text{S}}/{\text{m}}\right]\) (see Sect. “Approximating the total field assuming linear time variation in the source currents”) and \({\tau }_{P}^{{2}_{1}}\approx 8000\left[{\text{yr}}\right]\) these fields can only emerge from within an upper layer of \({\updelta }_{{{\text{z}}}_{{\text{max}}}}\approx 800\left[{\text{km}}\right]\). The upper limit for possible source depth in the models was therefore taken accordingly.

To give an initial estimate of the bounds on the possible values of the other source parameters, further considerations were made based on previous simulation results, statistical constraints on the geomagnetic field and the evolution of the geomagnetic dipole moment (GMD).

Gillet et al. (2010) provided a benchmark for the range of magnetic induction intensities within the Earth's core. Their results show that, at least the radial component of the induction vector inside the Earth's core can vary roughly between

$${B}_{{\text{min}}}\approx 1\left[{\text{mT}}\right]$$
(35)

and

$${B}_{{\text{max}}}\approx 6\left[{\text{mT}}\right].$$
(36)

These values were used as limitations on the possible range of values for the interior field.

Using this, six simplified independent relations (rules) between the extreme values of current loop parameters and the quantities GMD, \({B}_{{\text{min}}}\) and \({B}_{{\text{max}}}\) were constructed, represented by Eqs. (3738), (3940) and (4142), respectively. The latter three quantities were considered to be known. In defining the relationships, sources were treated as current loops carrying constant currents and having fixed positions:

$${B}_{{\text{min}}}=\mu {I}_{{\text{min}}}{\left(2{R}_{{\text{max}}}\right)}^{-1},$$
(37)
$${B}_{{\text{max}}}=\mu {I}_{{\text{max}}}{\left(2{R}_{{\text{min}}}\right)}^{-1}.$$
(38)

Equations (37) and (38) are related to combinations of source parameters that result in the largest and smallest absolute values of magnetic induction at the center of the loops (Fig. 19). The known minimum and maximum magnitudes of the internal field cannot be exceeded in these cases, thus, values (35) and (36) can be assigned to their left-hand sides.

Fig. 19
figure 19

Determination of the rule for \({B}_{{\text{min}}}\) (a) and \({B}_{{\text{max}}}\) (b) using circular currents taking extreme values of source parameters. Rmin and Rmax denote the maximum and minimum circular current radii sought, and Imin and Imax denote the maximum and minimum currents carried by the loops

Two extreme configurations can also be considered. One in which the loops with the strongest current and the largest spatial extent give rise to the GMD together, and another in which the smallest loops carrying the weakest current do the same (Fig. 20). Then, Eqs. (39), (40) are simply given by aligning the axes of these current systems in these hypothetical models with the direction of the GMD. Here, as the number of sources, \({N}_{{\text{max}}}\) is assigned to the model containing the smallest loops and \({N}_{{\text{min}}}\) to the model containing the largest ones:

$${\text{GMD}}={I}_{{\text{max}}}{R}_{{\text{max}}}^{2}\pi {N}_{{\text{min}}},$$
(39)
$${\text{GMD}}={I}_{{\text{min}}}{R}_{{\text{min}}}^{2}\pi {N}_{{\text{max}}}.$$
(40)

Using two adjacent current systems in the two extreme models shown in Fig. 20, sharing a common axis, one can estimate how close these pairs can be to each other (see Fig. 21).

For this, field constraints (3536) and the relation defining the axial magnetic field component along the axis of a current loop (Jackson 1998) can be exploited. On one hand, the distance of the two largest current loops furthest apart from each other (Fig. 21a) must be at most such that the minimum of the axial component of the magnetic induction (halfway along the axis) cannot fall below constraint (35):

$${B}_{{\text{min}}} =2\frac{{\mu }_{0}{R}_{{\text{min}}}^{2}{I}_{{\text{max}}}}{2{\left({({d}_{{\text{max}}}/2)}^{2}+{R}_{{\text{min}}}^{2}\right)}^{3/2}}.$$
(41)

Similarly, the axial component of the magnetic induction along the common axis of a pair of the smallest current loops is at its maximum at the center of the loops. It can reach at most the value (36) in the model where the smallest current loops are closest to each other (Fig. 21b):

$${B}_{{\text{max}}} =\frac{{\mu }_{0}{R}_{{\text{min}}}^{2}{I}_{{\text{min}}}}{{2\left({{d}_{{\text{min}}}}^{2}+{R}_{{\text{min}}}^{2}\right)}^{3/2}}+\frac{{\mu }_{0}{I}_{{\text{min}}}}{2{R}_{{\text{min}}}}.$$
(42)
Fig. 20
figure 20

Two extreme model configurations, in which the GMD is hypothetically produced by individual sources of the same size and elementary momentum which is in alignment with the GMD. \({\delta }_{{{\text{z}}}_{{\text{max}}}}\) is the penetration depth corresponding to Eq. (34) (the figure is not to scale)

A further simplification is that the current loops in these models are uniformly distributed in the outer core. Their relative (equidistant) spacing is approximated by dividing the \({\delta }_{{{\text{z}}}_{{\text{max}}}}\) thickness of the outer core volume into the same number of equal-sized cubes as the current loops in the models.

$${d}_{{\text{min}}}=\sqrt[3]{\left(4/3\right)\pi \left({r}_{{\text{CMB}}}^{3}-{\left({r}_{{\text{CMB}}}-{\delta }_{{z}_{{\text{max}}}}\right)}^{3}\right)/{N}_{{\text{max}}}}\left(\approx 4.56*{10}^{5} \left[{\text{m}}\right]\right),$$
(43)
$${d}_{{\text{max}}}=\sqrt[3]{\left(4/3\right)\pi \left({r}_{{\text{CMB}}}^{3}-{\left({r}_{{\text{CMB}}}-{\delta }_{{z}_{{\text{max}}}}\right)}^{3}\right)/{N}_{{\text{min}}}})\left(\approx 1.57*{10}^{6} \left[{\text{m}}\right]\right).$$
(44)

Equations (45) and (46) are then obtained by substituting Eqs. (4344) into Eqs. (4142).

In contrast to these hypothetical models, in forward solutions used for the reconstruction of the actual geomagnetic field, the direction of the axes of the current systems is radial based on other considerations (see Sect. “Approximating the total field assuming linear time variation in the source currents”). However, as it turns out (see Appendix C) these relations were suitable for the inclusion of parameter boundaries within which representative models of the CMB radial field could be generated, both qualitatively and quantitatively.

Since the resulting system of Eqs. (4146) is non-linear, an iterative solution was chosen to solve them. This used an implementation of the Newton–Raphson method in MATLAB (Yang et al. 2005). The bounds defining Eqs. (4146) were directly introduced by assigning the target values everywhere to the left and the variables to the right, which in case of (4142):

$${B}_{{\text{min}}}=\mu {R}_{{\text{max}}}^{2}{I}_{{\text{max}}}{\left[{R}_{{\text{max}}}^{2}+{N}_{{\text{min}}}^{-2/3}{\left(2\pi \left({r}_{{\text{CMB}}}^{3}-{\left({r}_{{\text{CMB}}}-{\delta }_{{z}_{{\text{max}}}}\right)}^{3}\right)\right)}^{1/3}\right]}^{-3/2},$$
(45)
$${B}_{{\text{max}}}=\mu {I}_{{\text{min}}}{\left(2{R}_{{\text{min}}}\right)}^{-1}+\mu {R}_{{\text{min}}}^{2}{I}_{{\text{min}}}{\left[{R}_{{\text{min}}}^{2}+{N}_{{\text{max}}}^{-2/3}{\left(4\pi \left({r}_{{\text{CMB}}}^{3}-{\left({r}_{{\text{CMB}}}-{\delta }_{{z}_{{\text{max}}}}\right)}^{3}\right)\right)}^{1/3}\right]}^{-3/2}.$$
(46)

The final optimal solution (after ~ 1000 iteration steps) does not perfectly approximate any of the target values (see Table 5), however, even when applying the highest and smallest GMDs derived from GUFM-1 the training data (Jackson et al. 2000; Korte and ConsTable 2005), the solution does not change significantly (see Table 6).

Fig. 21
figure 21

The maximum dmax (a) and minimum dmin (b) distances allowed between the circular currents, belonging to the conceptual models shown in Fig. 20

Considering the rate of change in the source current over time, the following theoretical upper bound is obtained using the resulting \({R}_{{\text{min}}}\) from Table 4. If the change in the radial component of the induction vector directly above a source is investigated, and the effect of conductivity is neglected, a maximum (secular) change in the source current over time can be assigned to the maximum SV observed in the training data. If an axially aligned current loop in the model containing only the sources of the smallest spatial extent is considered (Fig. 20b), the local axial component of its associated magnetic induction vector coincides with the global radial component, from which it can be written (Jackson 1998):

$${\left(\frac{\partial I}{\partial t}\right)}_{{\text{max}}}=\frac{2{\left({d}_{{\text{max}}}^{2}+{R}_{{\text{min}}}^{2}\right)}^{3/2}}{{\mu }_{0}{R}_{{\text{min}}}^{2}}{\left(\frac{\partial {B}_{r}}{\partial t}\right)}_{{\text{max}}}.$$
(47)

From the bounding Eq. (47) one can obtain that if \({\left(\frac{\partial {{\text{B}}}_{{\text{r}}}}{\partial {\text{t}}}\right)}_{{\text{max}}}\) corresponds to the maximum radial SV component (measured on a 5-year basis) calculated from the GUFM-1 data set, then the current of any given current loop in the models can vary in time at most by

$${\left(dI/dt\right)}_{{\text{max}}}=1.13\left[A/s\right].$$
(48)

An important consequence of neglecting field diffusion here is described in Appendix C. Using the approximate solution introduced in Sect. “Approximating the total field assuming linear time variation in the source currents”, the total field of such a large change in the current above the largest sources could be as high as:

$$|{B}_{{r}_{{\text{tot}}}}^{{\text{max}}}|=\left|{B}_{{r}_{{\text{prim}}}}^{{\text{max}}}+{B}_{{r}_{{\text{ind}}}}^{{\text{max}}}\right|=\left|\frac{{\mu }_{0}{R}_{{\text{max}}}^{2}{I}_{{\text{max}}}}{2{\left({\delta }_{{z}_{{\text{max}}}}+{R}_{{\text{max}}}^{2}\right)}^{3/2}}+{\left(dI/dt\right)}_{{\text{max}}}*C\left({R}_{{\text{max}}},{\delta }_{{z}_{{\text{max}}}},0\right)\right|\approx 0.074\left[T\right]>>{F}_{u},$$
(49)

where \({F}_{u}\approx 0.0032[T]\) is a value estimate based on the training data above which a field magnitude occurring in the generated samples can be considered an outlier (see Appendix C).

This means that source currents which have only a much smaller range in time variation can be applied when one aims at generating training samples representative in terms of actual geomagnetic field values. The simple procedure implemented to ensure this is described in Appendix C as well.

Table 4 The minimum and maximum values of the model parameters, estimated using formulas (3740) and (4546)

Table 5 contains the target values used for the computation of the range of possible current loop parameter values and their approximation using the derived extreme parameters. Table 6 represents the stability of the derived extreme parameters with respect to the changing GMD target values in the historical record.

Table 5 Validation against the characteristic target measures of the geomagnetic field when applying the derived limits of potential loop parameters in Eqs. (39)–(42) and Eqs. (45)–(46)
Table 6 Comparing the resulting lower and upper limits of potential loop parameters derived when using the weakest and strongest historical GMD magnitudes (Jackson et al. 2000) in the corresponding target values

Appendix C: Assembling and generating training and test data

In constructing the training dataset, synthetic magnetic field maps for the source models were generated within the bounds of source parameters discussed in Appendix B using the forward solution presented in Sect.  “Approximating the total field assuming linear time variation in the source currents”.

Real geomagnetic data incorporated into the training samples were derived from the GUFM-1 historical geomagnetic field model (Jackson et al. 2000). The samples were Mercator maps of the CMB radial field with the same resolution as described in Sect. “Primary field of stationary current loops” taken from the model epochs using an increment value of 5 years.

A fundamental requirement for the synthetically generated part of any sample dataset used for ML is that it should contain values that are at least representative of the real data in terms of their order of magnitude (Huyen 2022). This was fulfilled by applying the sample selection criteria explained below.

A commonly used criterion for constructing a training data set is the selection of outliers, which can be defined using the quartiles of the distribution of the real data (in our case being a set \(A\) containing all radial magnetic field values coming from the GUFM-1 training data). This method is suitable also for data having non-normal distributions (Ilyas and Chu 2019; Czirok et al. 2022):

$${F}_{l} = Q1\left(A\right)- 1.5\left(Q3\left(A\right)- Q1\left(A\right)\right)= -0.0031 \left[T\right],$$
(50)
$${F}_{u} = Q3\left(A\right)+ 1.5\left(Q3\left(A\right)- Q1\left(A\right)\right)= 0.0032 \left[T\right],$$
(51)

where \(Q1\) and \(Q3\) denote the first and third quartiles of the real data set (\(A\)).

Incorporation of a given sample into synthetic training data was rejected based on empirical conditions applied for field magnitudes which are detailed below.

The generation of an \(i\) individual current loop in the model was aborted if it did not satisfy the following three conditions within 100 repeated generation attempts.

1) The total field estimated directly above the source (estimated using Eq. (49)) fell within the range defined below:

$$\frac{{F}_{l}}{log\left(N\right)}<{B}_{{r}_{tot}}^{i}<\frac{{F}_{u}}{log\left(N\right)}.$$
(52)

2) The induced field of the source did not reduce the primary field to less than the primary field divided by \(e\) (i.e., the induction did not completely extinguish the primary field—a criterion related to Eq. (34) that gives the penetration depths for spherical diffusion times):

$$\frac{{B}_{{r}_{tot}}^{i}}{{B}_{{r}_{prim}}^{i}}>1/e.$$
(53)

3) The generated source must fall further away from its nearest neighbor than an angular distance of 2°. This ensured that, within the given resolution of the maps, the sources did not lie exactly underneath each other.

Conditions (1) to (3) can be checked already during the random generation of the individual source parameters, before the actual forward problem is solved, saving considerable computation time.

After solving the forward computation, the total magnetic field of each model sample was also checked using the condition below:

4) To ensure that the total field generated by the sample falls within the range defined by constraints \({F}_{l}\), \({F}_{u}\) it was accepted if and only if,

$${F}_{l}<{\sum }_{i}{B}_{{r}_{{\text{tot}}}}^{i}<{F}_{u}.$$
(54)

As it was postulated in Sects. “Approximating the total field assuming linear time variation in the source currents” and Appendix B, conditions (1)–(2) and (4) were chosen to be such that a training set representative in terms field magnitudes could be generated.

To alleviate the problem discussed in Appendix B concerning the rates of change in the current possible in the model samples, a systematic test was performed. It was conducted using a 'damping factor' \(\upgamma\) applied on \(\pm {\left(dI/dt\right)}_{{\text{max}}}\), and summing up how many models can be accepted by varying source currents randomly within these values. Conditions (1) to (4) were applied as acceptance criteria. Figure 22 shows that for models with different numbers of sources below \(\gamma =1100\), the number of acceptable generated samples (models) starts to decrease in a similar fashion.

Fig. 22
figure 22

Number of acceptable samples from 100 generated models, as a function of attenuation factor, for models with 25 and 125 sources

Finally, a training set containing 1030 Mercator maps generated using source models containing an incrementally increasing number of current loops (Table 7) and 75 maps coming from actual geomagnetic field values of the GUFM-1 model was assembled, and shuffled randomly when being loaded during the DANN training. Field maps in the GUFM-1 model were obtained directly at each geographical coordinate point given by the applied resolution (see Sect. “Primary field of stationary current loops”) and each epoch using the example program available in the gufm1-webservice (Rehfeld 2019). This was supplemented by maps of radial SV values readily available in case of GUFM-1 data on a 5-year basis and computed similarly using only the solutions for the primary fields in case of synthetic data (as a consequence of the approximation applied for the induced fields):

$$SV = {(B}_{{r}_{{\text{prim}}}}(I,{t}_{0}) - {B}_{{r}_{{\text{prim}}}}(I+dI/dt*\Delta t,{t}_{0}+\Delta t))/\Delta t,$$
(55)

where

$$\Delta t=5*3.15*{10}^{7} [s].$$
(56)
Table 7 Distribution of model samples in the synthetic training data set, containing 1030 Mercator (\(\phi ,\Lambda\)) maps of field and model parameter values

Using this procedure synthetic magnetic fields could be generated which could produce radial magnetic components falling within the order-of-magnitude margins of the actual CMB radial field values derived from the GUFM-1 dataset. For comparison, histograms representing field distributions of the geomagnetic model and synthetic (loop) model values are shown on Figs. 23 and 24. In both cases, significant number of data points occur between \(\pm 1 [mT]\), however, the synthetic training data values have a near-normal distribution, whereas real training data have a depletion in a range of positive values due to the South Atlantic Anomaly (see e.g., (De Santis and Quamili 2010)).

Fig. 23
figure 23

Distribution of magnetic field values (radial component) on the CMB surface for the real geomagnetic models used for training (selected from the GUFM-1 model in 5 year epochs from 1600 to 1975)

Fig. 24
figure 24

Distribution of magnetic field (radial component) values on the CMB surface in all the models of the complete synthetic training set

Unsurprisingly, as the attenuation was applied for \(dI/dt\) in the current loop models, the bulk of synthetic data points are in an SV range 2 orders of magnitude smaller than those coming from the GUFM-1 model (see Figs. 25 and 26).

Fig. 25
figure 25

Estimated distribution of SV values in the GUFM-1 data set used for the training (calculated on a 5-year basis)

Fig. 26
figure 26

Estimated distribution of SV values in the full synthetic data set (5-year basis). Note that in this figure, the x axis represents a range almost two orders-of-magnitude smaller range than in Fig. 25

To generate maps of target variables for the neural net training, normalized preprocessed maps of source parameter values were computed for each model as follows:

$${{\varvec{P}}}_{k}\left(\phi ,\Lambda \right)= {{\varvec{L}}}_{k}\left(\phi ,\Lambda \right){\boldsymbol{ }p}_{ k }.$$
(57)

In Eq. (57), \({{\varvec{L}}}_{k}\) is a sparse matrix defined as

$${{\varvec{L}}}_{k}\left(\phi ,\Lambda \right) = \left\{\begin{array}{c}1: \left(\phi ,\Lambda \right)\equiv \left({\phi }_{{k}_{i}},{\Lambda }_{{k}_{i}}\right)\\ 0: \left(\phi ,\Lambda \right)\ne \left({\phi }_{{k}_{i}},{\Lambda }_{{k}_{i}}\right)\end{array},\right.$$
(58)

\(\left({\phi }_{{k}_{i}},{\Lambda }_{{k}_{i}}\right)\) being the geographic coordinates of a given source \(i\) and \({p}_{ k}\) is a value of a given source parameter \(k\) for each corresponding current loop in the model. The preprocessing to optimize training performance is then carried out in the following fashion:

$${P}_{k j}\left({\phi }_{{k}_{l}},{\Lambda }_{{k}_{l}}\right)={\text{max}}\left({P}_{k j-1}\left({\phi }_{{k}_{l}},{\Lambda }_{{k}_{l}}\right), {{P}_{kj-1}\left({\phi }_{{k}_{l}},{\Lambda }_{{k}_{l}}\right)e}^{-\frac{{\left({{\phi }_{k}}_{l}-{\phi }_{{k}_{i}}\right)}^{2}+{\left({\Lambda }_{{k}_{l}}-{\Lambda }_{{k}_{i}}\right)}^{2}}{2\Sigma }}\right) |{k}_{l}-{k}_{i}|\le 5, { k}_{l}=\mathrm{1,2},\dots ,25.$$
(59)

In the above equation, \({e}^{-\frac{{\left({{\phi }_{k}}_{l}-{\phi }_{{k}_{i}}\right)}^{2}+{\left({\Lambda }_{{k}_{l}}-{\Lambda }_{{k}_{i}}\right)}^{2}}{2\Sigma }}\) is a two-dimensional Gaussian kernel function, where \(\Sigma\) was set to 2 and \(j\) iterates through each grid point in a 5-by-5 wide window.

Parameter maps were then normalized using the maximum possible values of the source parameters defined in Appendix B:

$${{\varvec{P}}}_{r}^{n}\left(\phi ,\Lambda \right)={({\varvec{P}}}_{r}\left(\phi ,\Lambda \right)-{r}_{{\text{min}}})/{\delta }_{{z}_{{\text{max}}}},$$
(60)
$${{\varvec{P}}}_{dI/dt}^{n}\left(\phi ,\Lambda \right)={({\varvec{P}}}_{dI/dt}\left(\phi ,\Lambda \right)+{\left(dI/dt\right)}_{{\text{max}}})/\left(2*{\left(dI/dt\right)}_{{\text{max}}}\right),$$
(61)
$${{\varvec{P}}}_{R}^{n}\left(\phi ,\Lambda \right)={{\varvec{P}}}_{R}\left(\phi ,\Lambda \right)/{R}_{{\text{max}}},$$
(62)
$${{\varvec{P}}}_{I}^{n}\left(\phi ,\Lambda \right)=\left({{\varvec{P}}}_{I}\left(\phi ,\Lambda \right)+{I}_{{\text{max}}}\right)/\left(2*{I}_{{\text{max}}}\right).$$
(63)

Examples of the final maps containing real and synthetic data incorporated in the training set are shown in Figs. 27 and 28.

Fig. 27
figure 27

Mercator maps of \({{\varvec{P}}}_{r}^{n}\left(\phi ,\Lambda \right)\) (a)), \({{\varvec{P}}}_{I}^{n}\left(\phi ,\Lambda \right)\) (b) and radial magnetic field (c) in the synthetic component of the training set

Fig. 28
figure 28

Mercator map of the CMB radial magnetic field in the real component of the training set (GUFM-1, epoch 1950)

Test data were also generated in the same fashion as the training data, but a smaller number of models was admitted, and test samples contained a differing number of loops than those of the training data set (see Table 8)

Table 8 Distribution of model samples in the synthetic test data set

Appendix D: Deriving magnetic field and SV maps from the COVOBS geomagnetic field model

CMB radial field and secular variation data used for the demonstration in Sect.  “Results on real geomagnetic model data” were extracted from the COVOBS geomagnetic field model with the help of the pygeodyn python package (Huder et al. 2019). Maps were constructed using an SH expansion of the model coefficients up to degree 14 in pyshtools (Wieczorek and Meschede 2018).

Appendix E: Description of the genetic algorithm used for final reconstruction

The overall workflow of the GA remained the same as described in (Kuslits et al. 2020). However, the parameter vector:

$${\varvec{P}}{\varvec{a}}{{\varvec{r}}}_{{\varvec{i}}}=\left[\begin{array}{c}\left({\phi }_{i}\right)\\ \left({\Lambda }_{i}\right)\\ \left({r}_{i}\right)\\ \left({R}_{i}\right)\\ \left({I}_{i}\right)\\ \left({\theta }_{i}\right)\\ ({\lambda }_{i})\end{array}\right]$$
(64)

could be further constrained via sampling possible values for the elements of vector (64) from the UNet ++ solution maps within 2-pixel intervals of \({(\phi }_{i},{\Lambda }_{i})\) (see Fig. 8), and applying a logarithmic transformation \({\varvec{P}}{\varvec{a}}{{\varvec{r}}}_{i}=[{\text{log}}\left({\upphi }_{i}\right), {\text{log}}\left({\Lambda }_{i}\right)]\).

Initially, \({N}_{{\text{ind}}}*{N}_{{\text{pop}}}\) solutions are generated with randomly chosen parameter values, where \({{\text{N}}}_{{\text{pop}}}\) is the number of populations and \({N}_{{\text{ind}}}\) is the number of individuals in the population.

At the beginning of the selection phase, the program calculates for each specimen the deviation of the solution from the target value according to Eq. (27). Then, the target values are ranked and linearly transformed (fitness value) to select individuals, the better the fit, the higher the probability (Baker 1985):

$${\text{fitness}}\left({\text{Pos}}\right)=2-SP+2\left(SP-1\right)\left({\text{Pos}}-1\right)/\left({N}_{{\text{ind}}}-1\right).$$
(65)

In Eq. (65), \(\text{Pos}\) represents the position of a given specimen (physical model) when ranking of all the models according to (27) produced in a given step within the population. The parameter \(\text{SP}\) also called selection pressure was assigned a value of \(\text{SP}=2\). This being a real coded algorithm, parameters of the individual solutions (\({{\varvec{P}}{\varvec{a}}{\varvec{r}}}^{\mathbf{t}\mathbf{r}}\)) varied in a given population during crossover or recombination obeying Eq. (66) (Picek et al. 2013):

$$({\varvec{P}}{\varvec{a}}{\varvec{r}}({N}_{in{d}_{l}},{\text{gen}}+1) | p{p}_{{\text{rek}}}=1) = \chi {\varvec{P}}{\varvec{a}}{\varvec{r}}({N}_{in{d}_{pt1}},{\text{gen}})+ (1-\xi ){\varvec{P}}{\varvec{a}}{\varvec{r}}({N}_{in{d}_{pt2}},{\text{gen}}),$$
(66)

where \({N}_{in{d}_{l}}\), \({N}_{in{d}_{pt1}}\), \({N}_{in{d}_{pt2}}\), are random numbers between 1 and \({N}_{ind}\) within any given population, gen denotes the generation (number of iteration steps), and \(\xi\) is also generated as a (pseudo-) random number in the interval [0, 1]. Crossover probability between the selected individuals was set to \(p\left(p{p}_{{\text{rek}}}=1\right)=0.6\). In the mutation step, the parameters of an individual could be randomly modified with a probability similar to the mutation rate for real coded algorithms in the work of Mühlenbein and Schlierkamp-Voosen (1993):

$$\left({\varvec{P}}{\varvec{a}}{\varvec{r}}\left({N}_{{{\text{ind}}}_{{\text{mut}}}},{\text{gen}}+1\right)| p{p}_{{\text{mut}}}=1\right)={\varvec{P}}{\varvec{a}}{\varvec{r}}\left({N}_{in{d}_{{\text{mut}}}},{\text{gen}}+1\right)+{\beta }_{1}\left({\varvec{P}}{\varvec{a}}{{\varvec{r}}}_{{\text{max}}}\left({N}_{{\text{ind}}}\right)-{\varvec{P}}{\varvec{a}}{{\varvec{r}}}_{{\text{min}}}\left({N}_{{\text{ind}}}\right)\right){\sum }_{j=1}^{a}{\beta }_{{2}_{j}}{\beta }_{3}^{-j}.$$
(67)

In Eq. (67), the mutation occurred with probability \(p(p{p}_{{\text{mut}}}=1)=0.02\), \({\beta }_{1}\) is a random number between − 1 and 1, \({\beta }_{{2}_{j}}\) is a sequence of random numbers with length \(a\) and elements \({p(\beta }_{{2}_{j}}=1)= 1/a\), \({\beta }_{3}=1.4\) (own setting) and \(a=20\) determines the smallest possible magnitude of the mutation effect. \({\varvec{P}}{\varvec{a}}{{\varvec{r}}}_{{\text{max}}}\left({N}_{{\text{ind}}}\right)\) and \({\varvec{P}}{\varvec{a}}{{\varvec{r}}}_{{\text{min}}}\left({N}_{{\text{ind}}}\right)\) denote the extreme values of each parameter, respectively, as defined in Appendix B. It is important to note, however, that the effects of the steps listed above have always been limited to these extrema:

$${\text{IF}}:\boldsymbol{ }{\varvec{P}}{\varvec{a}}{\varvec{r}}\left({N}_{{\text{ind}}},{\text{gen}}\right)>{\varvec{P}}{\varvec{a}}{{\varvec{r}}}_{{\text{max}}}\left({N}_{{\text{ind}}}\right)\boldsymbol{ }{\text{THEN}}\boldsymbol{ }: {\varvec{P}}{\varvec{a}}{\varvec{r}}\left({N}_{{\text{ind}}},{\text{gen}}\right)={\varvec{P}}{\varvec{a}}{{\varvec{r}}}_{{\text{max}}}\left({N}_{{\text{ind}}}\right)$$
(68)
$${\text{IF}}:\boldsymbol{ }{\varvec{P}}{\varvec{a}}{\varvec{r}}\left({N}_{{\text{ind}}},{\text{gen}}\right)<{\varvec{P}}{\varvec{a}}{{\varvec{r}}}_{{\text{min}}}\left({N}_{{\text{ind}}}\right) {\text{THEN}}:\boldsymbol{ }{\varvec{P}}{\varvec{a}}{\varvec{r}}\left({N}_{{\text{ind}}},{\text{gen}}\right)={\varvec{P}}{\varvec{a}}{{\varvec{r}}}_{{\text{min}}}\left({N}_{{\text{ind}}}\right).$$
(69)

The set of individuals that underwent the mutation process formed the initial populations for the next iteration step. Individuals were also swapped between each other at a rate of 4% of the population size after a specified number of consecutive steps. This algorithmic logic, in which the evolution process of populations is run separately in parallel up to a specified iteration step interval is called a multi-population genetic algorithm (Grefenstette 1981). Exchanging some individuals among the populations at a given proportion is referred to as migration.

Achieving a globally optimal solution in case of synthetic data typically required several hundred to a thousand iteration steps. The reconstruction of real geomagnetic field and SV data presented in Sect. “Results on real geomagnetic model data” took 560 steps.

Appendix F: Results using noisy synthetic data

To perform a first-order stability analysis, noise was indirectly introduced to the synthetic magnetic fields via the perturbation of loop parameters normalized using the same corresponding quantities as in Eqs. (60)–(63). Perturbed normalized parameters \({{\varvec{P}}{\varvec{a}}{\varvec{r}}}_{{\varvec{N}}}^{{\varvec{n}}}\) came from a normal distribution with an average \({{\varvec{P}}{\varvec{a}}{\varvec{r}}}^{{\varvec{n}}}\) and standard deviation \(\Sigma\):

$${\varvec{\delta}}\sim \left|{{\varvec{P}}{\varvec{a}}{\varvec{r}}}^{{\varvec{n}}}-{{\varvec{P}}{\varvec{a}}{\varvec{r}}}_{{\varvec{N}}}^{{\varvec{n}}}\right|,\boldsymbol{ }{{\varvec{P}}{\varvec{a}}{\varvec{r}}}_{{\varvec{N}}}^{{\varvec{n}}}\sim {\varvec{N}}\left({{\varvec{P}}{\varvec{a}}{\varvec{r}}}^{{\varvec{n}}},\Sigma \right),$$
(70)
$$\widetilde{{{\varvec{P}}{\varvec{a}}{\varvec{r}}}^{{\varvec{n}}}}={{\varvec{P}}{\varvec{a}}{\varvec{r}}}^{{\varvec{n}}}+{\varvec{\delta}},\boldsymbol{ }\boldsymbol{ }\widetilde{{{\varvec{P}}{\varvec{a}}{\varvec{r}}}^{{\varvec{n}}}}\boldsymbol{^{\prime}}={{\varvec{P}}{\varvec{a}}{\varvec{r}}}^{{\varvec{n}}}-{\varvec{\delta}},$$
(71)

and after transforming back to dimensional values:

$${\widetilde{B}}_{{r}_{{\text{tot}}}}=\frac{{\widetilde{B}}_{{r}_{{\text{tot}}}}\left(\widetilde{{\varvec{P}}{\varvec{a}}{\varvec{r}}}\right)+{\widetilde{B}}_{{r}_{{\text{tot}}}}\left({\widetilde{{\varvec{P}}{\varvec{a}}{\varvec{r}}}}^{\boldsymbol{^{\prime}}}\right)}{2},$$
(72)
$${\widetilde{SV}}_{r} =\frac{\left({{\widetilde{B}}_{{r}_{tot}}\left({t}_{0}+\Delta t\right)-\widetilde{B}}_{{r}_{{\text{tot}}}}\left({t}_{0}\right)\right)}{\Delta t},$$
(73)

\(\Delta t\) being the 5-year basis mentioned in Appendix C.

This allowed for producing averaged noisy magnetic field and SV maps for a reference model with a given number of loops. The response of the full reconstruction of the physical model to the noise introduced to input data has so far only been investigated for a model containing 15 loops, as the necessary series of calculations could be performed in a relatively short time.

Figure 29 summarizes the results obtained in terms of both the relative error of the parameter estimation and the misfit (see Sect. “Loss functions and quality metrics”). It can be observed that the increase in noise level does not degrade the fit of the reconstructed magnetic fields, but significantly degrades the accuracy of the reconstruction of the loop parameters.

Fig. 29
figure 29

Quality parameters \(\overline{{\text{NRMS}} }\) (27) and \({\text{RPE}}\) (29) plotted against an increasing noise level in the input data

Fig. 30
figure 30

Response of the inference to noise introduced in the input magnetic induction (a) and SV (b) data. Row c) shows source positions inferred by the network as the noise level \(\Sigma\) increases from left to right

For models with a larger number of circular currents, a stability analysis of the image-processing phase and the reconstructed maps of source parameter distributions was performed.

For a model with 25 loops, Fig. 30 shows how much of the source distribution the image-processing network was able to reconstruct using noise-free and noisy synthetic inputs. Row c) of this tiled figure demonstrates how, with an increasing noise level, the image-processing network starts to infer more sources in the vicinity of loops that are actually there.

Figure 31 summarizes the cross-correlation values between the reconstructed and the true distribution maps. On one hand, the figure shows that even with the above-mentioned increase in error, the reconstruction of the geographical distribution remains in general the least sensitive to noise constructed using Eqs. (7073).

On the other hand, with the inclusion of a larger number of sources to be reconstructed, the stability deteriorates significantly (although it should be noted that since the noise (7073) is added separately to each loop parameter, it may be cumulative for a larger number of sources).

Fig. 31
figure 31

Cross-correlations between true and inferred source parameter maps plotted against an increasing noise level in a model containing 25 (a) and 100 (b) loops

Appendix G: Difficulties with using a “classical” approach

One can legitimately think about the question of whether apart from the possibility of using the DANN methodology, the use of machine learning for the estimation task presented in this paper is justified at all. To demonstrate its advantages when compared to previous efforts (reviewed in Sect.  “Introduction”) resorting to linear and non-linear inversion directly for recovering current loops, the authors reimplemented the rather ingenious attempt of Alldredge (1987).

His method was particularly interesting for the authors as though it did not take into account SV and magnetic diffusion, it too applied little a priori constraints besides assuming radially aligned current loops. It arguably uses more straightforward computations as it operates mostly in the spherical harmonic spectral domain and offers good convergence as far as RMS misfit values are concerned.

The forward computation results in the spherical harmonic coefficients (SHC) representing a set of L loops using Eq. (74) described in detail in Alldredge (1987):

$$\widehat{{g}_{n}^{m}}=\sum_{l=1}^{L}{K}_{l}\frac{2n}{{\text{sin}}\left({\alpha }_{l}\right)}{\left(\frac{{r}_{l}}{a}\right)}^{n-1}\frac{{P}_{n}^{1}\left({\text{cos}}\left({\alpha }_{l}\right)\right)}{\sqrt{2n\left(n+1\right)}}{P}_{n}^{m}\left({\text{cos}}\left({\phi }_{l}\right)\right){\text{cos}}\left(m{\Lambda }_{l}\right),$$
(74)
$$\widehat{{h}_{n}^{m}}=\sum_{l=1}^{L}{K}_{l}\frac{2n}{{\text{sin}}\left({\alpha }_{l}\right)}{\left(\frac{{r}_{l}}{a}\right)}^{n-1}\frac{{P}_{n}^{1}\left({\text{cos}}\left({\alpha }_{l}\right)\right)}{\sqrt{2n\left(n+1\right)}}{P}_{n}^{m}({\text{cos}}({\phi }_{l})){\text{sin}}(m{\Lambda }_{l}),$$
(75)

where \({r}_{l}\) is the distance between Earth’s center and a given circular current loop \(l\), \({\alpha }_{l}\), is half of the viewing angle of a loop from Earth’s center, \({\phi }_{l}\) and \({\Lambda }_{l}\) are the loop’s geographic longitude and co-latitude, and

$${K}_{l}= \frac{{10}^{9}{\mu }_{0}{r}_{l}^{2}{\text{sin}}({\alpha }_{l}){I}_{l}}{{a}^{3}}.$$
(76)

The estimation of the loop parameters according to the definition used by Alldredge, \({{\varvec{P}}{\varvec{a}}{\varvec{r}}}_{Al}=[{K}_{l}, {\alpha }_{l}, {r}_{l},{\phi }_{l},{\Lambda }_{l}]\) was performed via a least-squares fitting of the SHC:

$$\widehat{{Par}_{Al}}={\text{argmin}}\left({\left(\widehat{{g}_{n}^{m}\left({Par}_{Al}\right)}-{g}_{n}^{m}\right)}^{2}+{\left(\widehat{{h}_{n}^{m}\left({Par}_{Al}\right)}-{h}_{n}^{m}\right)}^{2}\right).$$
(77)

The standard linear inversion was applied for \({K}_{l}\), and a variant of the Marquardt–Levenberg algorithm for the rest of the loop parameters which are in a non-linear relationship with coefficients \({g}_{n}^{m}\), \({h}_{n}^{m}\).

It turns out however, that when one tries to practically implement and test this technique, a handful of issues arise. Most notably, Alldredge mentions, that (at the time) he had no real option for inferring the potential number of current loops and determining an initial set of loop parameters, other than resorting to intuition, and heuristic considerations.

He tried to address the issue of correct initial placement by assigning initial geographic loop positions via identifying ‘peaks’ and ‘valleys’ on maps of the radial geomagnetic field. One can think of a minimum gradient-based procedure, such as the one used in Sect. “Refining the ML-based inversion” to pinpoint source locations on the maps inferred by the network, applied directly on the radial field data to provide an intuitive guess on initial loop parameters. However, as we could already observe in Fig. 6 (Sect. “Approximating the total field assuming linear time variation in the source currents”), the more loops are admitted to a given model, the more complex the corresponding CMB radial fields become. Figures 32 and 33 show that accordingly, it becomes progressively more difficult to assess an initial geographic distribution of the loops correctly.

Fig. 32
figure 32

Initial loop positions assigned using peaks of radial magnetic field values in the model containing the five loops used in the stability analysis (a) and the corresponding correct (actual) positions of the loops (b). Initial loop positions assigned similarly for a model containing 50 loops (c) and the correct loop positions (d)

Figure 33 shows a quantitative comparison of this procedure against our neural net inference on the same data used in part to test our method in Sect. 4.1 (see Figs. 10 and 11). For each sample, the number of correctly identified or “discovered” loops was determined using the criterion:

$${{\varvec{L}}}_{{\text{dis}}{c}_{l}} = \left\{\begin{array}{c}1:\left|\widehat{{\phi }_{l}}-{\phi }_{l}\right|+\left|\widehat{{\Lambda }_{l}}-{\Lambda }_{l}\right|<{4}^{\circ }\\ 0: \left|\widehat{{\phi }_{l}}-{\phi }_{l}\right|+\left|\widehat{{\Lambda }_{l}}-{\Lambda }_{l}\right|\ge {4}^{\circ }\end{array}\right.$$
(78)

(a distance of 2 gridpoints on the maps) in accordance with condition (1) in Sect. “Approximating the total field assuming linear time variation in the source currents”.

To understand the results of this comparison, we need to consider two things. Firstly, loop models with very few sources have not been shown at all to the network during training. Secondly, assigning loop positions simply via radial field maxima may work reasonably well when the radial fields are generated by a few isolated loops. Thus, in these cases (marked by the blue area on the graphs), the network is much less accurate, while after more and more loops are admitted to the models, we see that the Alldredge’s intuition for initial placement becomes very inaccurate compared to the neural network-based solution.

Moreover, a stability analysis demonstrating the sensitivity of Alldredge’s estimation procedure with respect to these initially assigned loop parameters \({\upphi }_{l},{\Lambda }_{l}\) was carried out. During this test, a maximum SH degree of 15 was used in the forward computations. Different deviations from the correct parameters in relation with their absolute values were inspected. A total number of \(i=5\) iterations, carrying out the linear and non-linear estimation steps sequentially as described by Alldredge, was performed for each case. The RPE error measure was applied in the same fashion as defined in Eq. (29) in Sect. “Loss functions and quality metrics”. Its respective changes during the total number of iterations (\(\mathrm{\Delta RPE}= {{\text{RPE}}}_{0}-{{\text{RPE}}}_{i}\)) are shown in Table 9 for each case.

Table 9 Relative change in RPE over the course of i = 5 iterations as the ratio of initial RPE

Table 9 shows that the method tends to stick in local minima when the initial geographic placement of the loops is incorrect. Even comparatively minor misplacements of a relatively small set of five loops resulted in a divergence (positive \({\text{RPE}}\) ratios) of the estimated loop parameters from correct ones even though the corresponding RMS misfit values decreased considerably (see Fig. 34).

Erroneous estimations arise in particular, because a relatively small initial deviation in the initial parameters can result in very high \({\text{RPE}}\) values mostly due to poor estimations for \({K}_{l}\) (see Fig. 35).

This again may not be a major issue for recovering a small set of well isolated loops, where a good initial guess for the placement of the loops is possible. However, in a problem setting as the one established in our study, where a potentially very large set of current loops with a wide range of loop parameters can account for the complex pattern of the observed core field, it renders the application of such estimation procedures infeasible. Comparing these results with the RPE and CCC values obtained using our method shown in Figs. 10a and 11, and considering the practical arguments set out in Sect. “Introduction” make the above conclusion even more clear.

Fig. 33
figure 33

Average rate of correctly identified current loops as a function of the number of loops in the test examples

Fig. 34
figure 34

Graph showing the decrease of RMS error when using Alldredge’s method for model 5(b). The parameters can easily diverge from the correct ones even if the RMS misfit decreases

Fig. 35
figure 35

RPE measures obtained after 5 full iterations assuming different relative errors for the initial placement of current loops using Alldredge’s method as a function of the number of loops used in the test examples shown in Table 9

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kuslits, L., Horváth, A., Wesztergom, V. et al. An alternate representation of the geomagnetic core field obtained using machine learning. Earth Planets Space 76, 77 (2024). https://doi.org/10.1186/s40623-024-02024-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s40623-024-02024-5

Keywords