 Full paper
 Open Access
 Published:
A novel Bayesian approach for disentangling solar and geomagnetic field influences on the radionuclide production rates
Earth, Planets and Space volume 74, Article number: 130 (2022)
Abstract
Cosmogenic radionuclide records (e.g., ^{10}Be and ^{14}C) contain information on past geomagnetic dipole moment and solar activity changes. Disentangling these signals is challenging, but can be achieved by using independent reconstructions of the geomagnetic dipole moment. Consequently, solar activity reconstructions are directly influenced by the dipole moment uncertainties. Alternatively, the known differences in the rates of change of these two processes can be utilized to separate the signals in the radionuclide data. Previously, frequency filters have been used to separate the effects of the two processes based on the assumption that millennialscale variations in the radionuclide records are dominated by geomagnetic dipole moment variations, while decadaltocentennial variations can be attributed to solar activity variations. However, the influences of the two processes likely overlap on centennial timescales and possibly millennial timescales as well, making a simple frequency cut problematic. Here, we present a new Bayesian model that utilizes the knowledge of solar and geomagnetic field variability to reconstruct both solar activity and geomagnetic dipole moment from the radionuclide data at the same time. This method allows for the possibility that solar activity and geomagnetic dipole moment exhibit variations on overlapping timescales. The model was tested and evaluated using synthetic data with realistic noise and then used to reconstruct solar activity and the geomagnetic dipole moment from the ^{14}C production record over the last two millennia. The results agree with reconstructions based on independent geomagnetic field models and with solar activity inferred from the Group Sunspot number. Our Bayesian model also has the potential to be developed further by including additional confounding factors, such as climate influences on the radionuclide records.
Graphical Abstract
Introduction
Cosmogenic radionuclides such as ^{10}Be and ^{14}C are the best proxies for solar activity reconstructions prior to the period of direct solar observations (Beer et al. 1988; Muscheler et al. 2007) such as the record of group sunspot number (GSN) starting in 1610 CE (Svalgaard and Schatten 2016), and the extended neutron monitor data back to 1939 CE (Herbst et al. 2017). Cosmogenic radionuclide records are vital for studies of longterm changes of the Sun and Sun–climate linkages far back in time. The radionuclides are continuously produced from the interaction between highenergy galactic cosmic rays (GCRs) coming to the Earth (and secondary particles) and atoms in the atmosphere. Their production rates correlate with the flux of GCRs reaching the Earth’s atmosphere, which is modulated by solar and geomagnetic shielding (Beer et al. 2012). Thus, reconstructions of solar activity from cosmogenic radionuclide records require correction for the geomagnetic field influence, usually based on independent reconstructions of the geomagnetic dipole moment (GDM). Consequently, solar activity reconstructions are directly influenced by the GDM uncertainties. The uncertainties of GDM reconstructions depend on the choice of the underlying data and the different methods to build global geomagnetic field models from them. Moreover, differences in GDM reconstructions directly lead to discrepancies in solar activity reconstructions. Additional uncertainties in solar reconstruction arise from cosmogenic radionuclide measurement uncertainties and possible systematic biases such as uncorrected climatic influences on ^{10}Be transport and deposition processes or carbon cycle influences on atmospheric ^{14}C concentrations (Muscheler et al. 2007).
An alternative approach to directly disentangle solar and GDM influences on radionuclide records is to utilize the known differences on the rates of change and the GCR shielding effects of these two processes. In general, millennial variations in the radionuclide records are assumed to be dominated by GDM variations, while decadaltocentennial variations can be mainly attributed to solar activity variations (Snowball and Muscheler 2007). Utilizing this prior knowledge could not only eliminate the need for independent GDM estimates for solar activity reconstructions, but also provide the possibility to infer GDM variations from radionuclide records only. Those radionuclidebased GDM reconstructions provide valuable information on the past GDM as they are dominated by the global dipole field, in contrast to the information of magnetic field directions and intensity stored in, for example, archeological artifacts, igneous rocks and sediment records that only provide local readings of the past geomagnetic field. Longterm changes in GDM have been reconstructed from radionuclide data by removing variations on timescales shorter than 3000 years (Muscheler et al. 2005; Zheng et al. 2021). Similarly, the removal of longterm variations from the radionuclide data has been conducted in some studies to minimize the geomagnetic field influence in order to assess past changes in solar activity (e.g., Adolphi et al. 2014). Until now, these approaches have used simple frequency filtering with a hard cutoff frequency depending on the targets of the studies. However, Snowball and Muscheler (2007) showed that solar and GDM variability likely overlap making a simple frequency distinction problematic. Quasiperiodicities of ~ 200 years to up to ~ 2400 years in solar variations have been shown in previous studies of longterm radionuclide records (e.g., Bond et al. 2001; Wagner et al. 2001; Snowball and Muscheler 2007; Adolphi et al. 2014; Usoskin et al. 2016; Dergachev and Vasiliev 2019). Meanwhile, GDM can have shortterm variation on timescales between ~ 60 and ~ 200 years but with a relatively low power (Hellio and Gillet 2018; Huder et al. 2020). Significant influence of GDM on the radionuclide records begins on timescales between 300 and 500 years as suggested from the correlation between ^{14}C production rate and reconstructed GDM over the last 10,000 years (Snowball and Muscheler 2007). Therefore, the overlap of solar and GDM influences on timescales longer than 300 years is very challenging for separating their effect on cosmogenic radionuclide production rates. Thus, simple frequency filters likely fail in separating the signals and, in addition, often lead to unreliable end effects. These effects make it difficult to connect the reconstructions to present day values of solar activity or GDM intensity inferred from instrumental data. These difficulties motivated this study to improve methods to disentangle solar and geomagnetic influences for the reconstructions of solar activity and GDM from radionuclide records.
Here, we address the challenges discussed above by incorporating prior knowledge of solar and GDM variability and their influence on radionuclide production rates into a Bayesian framework. We present a new Bayesian method to separate solar activity and GDM variations from the radionuclides data inspired by recently developed methods to reconstruct geomagnetic field variations using paleomagnetic data (Hellio and Gillet 2018; Nilsson and Suttie 2021). Our goal is to develop a model that can incorporate and utilize the knowledge of solar and geomagnetic field variations to reconstruct both solar activity and GDM from the radionuclides data at the same time. This method allows for the possibility that solar activity and GDM exhibit variations on overlapping timescales. Different datasets used for the model development and testing are presented in the next section. In Sect. 3, we outline the Bayesian framework and our setup of prior information on the model parameters. The model is tested in Sect. 4 using synthetic ^{14}C data, before being applied to nearly 2000 years of ^{14}C production data inferred from IntCal20 (Reimer et al. 2020).
Data
Observationbased solar activity and geomagnetic field data for model calibration
The reconstruction of solar activity with ^{14}C data requires calibration with the instrumentally measured cosmicray flux record during the present period (Muscheler et al. 2016). The solar modulation of the cosmicray flux is usually quantified using the solar modulation potential, ϕ [MV], (also known as the solar modulation parameter). Particularly, the parameter approximates the adiabatic energy loss of GCRs in the heliosphere due to solar shielding (Vonmoos et al. 2006; Herbst et al. 2017). In this study, we used the monthly record of the solar modulation potential ϕ_{HE16} from 1939 to 2017 CE published by Herbst et al. (2017). The annual average of ϕ_{HE16} is shown in Fig. 1. The modulation potential depends on the assumed local interstellar spectrum (LIS) which is the flux of GCRs outside the heliosphere. We used the recent LIS model from Herbst et al. (2017).
The GSN record by Svalgaard and Schatten (2016) is the longest and most recent compilation of direct telescopebased solar observations. It contains the yearly average (average over all months of the year) of the GSN from 1610 CE to present. The GSN record is a relatively robust solar proxy which is not affected by the shielding effects of the geomagnetic field and climate influences as in the case of ^{10}Be records or by carbon cycle effects in the case of ^{14}C records. However, there are often gaps in the GSN data due to the discontinuity of observations. Consequently, the GSN record was compiled by many observers and, therefore, exhibits uncertainties from the process of calibration and combination of different datasets, especially during the datapoor times (Svalgaard and Schatten 2016). Moreover, the constant improvement of the observation technique through time adds more challenges to the calibration of the older data. We inferred the solar modulation potential from the GSN, i.e., ϕ_{GSN}, following the method proposed by Usoskin et al. (2002). A Monte Carlo approach was used to assess the uncertainty of the inference, i.e., 1000 realizations of the solar modulation potential were generated consistent with the GSN record and its uncertainties. The mean and 2sigma uncertainty of these 1000 realizations are shown in Fig. 1. The record clearly shows the 11year Schwabe cycle (Schwabe 1844). Centennial variations of solar activity are also represented such as the socalled “Grand Solar Minima”. These are extended periods of relatively low solar activity such as the Maunder Minimum from 1645 to 1715 CE and the Dalton Minimum from 1790 to 1820 CE (highlighted in Fig. 1). ϕ_{GSN} is the longest observationbased solar modulation record containing the known typical solar variability, i.e., the 11year variation and the centennial variation, and therefore we consider it suitable for model parameterization. Thus, we used ϕ_{GSN} to assess the typical behavior of solar activity which was then included as prior knowledge in our Bayesian model.
On the other hand, for the GDM calibration we used the recent model COVOBS. × 2 (Huder et al. 2020). The GDM was constrained by observational data from both groundbased stations and satellites as well as older surveys over the period 1840–2020 CE.
Radiocarbon data
We inferred the ^{14}C production rate for the period 1–1950 CE from the IntCal20 Northern Hemisphere ^{14}C calibration curve (Reimer et al. 2020). IntCal20 was compiled by the IntCal Working Group for improving the ^{14}C age calibration and it can be used to reconstruct fluctuations in past atmospheric ^{14}C concentrations. We calculated the ^{14}C production rate from the atmospheric ^{14}C fluctuations using a boxdiffusion carbon cycle model, which includes atmosphere, biosphere, an upper ocean mixed layer and 42 deepsea layers. The option for direct ventilation of the deep ocean was turned off (Siegenthaler, 1983). The uncertainty in the IntCal20 calibration curve was quantified using 100 posterior realizations of possible atmospheric ^{14}C curves obtained via fitting Bayesian splines to the ^{14}C data underlying IntCal20 (Heaton et al. 2020; Reimer et al. 2020). The average of the annual ^{14}C production rates inferred from these 100 realizations is depicted in Fig. 2. The ^{14}C production rates were normalized to have the preindustrial mean of one. The surge in atmospheric CO_{2} from 1850 CE due to fossilfuel burning was included in the calculation to account for the dilution of ^{14}C in relation to ^{12}C (Muscheler et al. 2007).
Shortterm solar proton events (SPEs) such as the 774/775 event and the 993/994 event (Miyake et al. 2012, 2013; Mekhaldi et al. 2015; Reimer et al. 2020) can be observed in the annual production rate as rapid rises in the production rate. During these events, the Sun released significant amounts of energetic particles which resulted in shortterm radionuclide production enhancements (Mekhaldi et al. 2015). These SPEs are outside the scope of this study and, therefore, they were not included in our model. The 774/775 event resulted in an extreme peak in the production rate and, therefore, it needs to be excluded from the model data. We removed the extreme values during the event and replaced them using a spline interpolation. In addition, we decreased the sampling resolution of the ^{14}C data prior to 1600 CE by computing the mean ^{14}C production rate every 10 years, as the data underlying IntCal20 decreases in amount and resolution further back in time. Figure 2 shows the mean and the 2sigma uncertainty of the processed ^{14}C production rate. The mix of different temporal resolutions allowed us to assess our model performance at both long (prior to 1600 CE) and short timescales (after 1600 CE) and, at the same time, it helped to save computational costs via reducing the number of data points. This approach was further motivated by the fact that the ^{14}C data underlying IntCal20 is continuously annual only for the last 1000 years. In fact, the processed ^{14}C data retain variations similar to the unprocessed data prior to 1000 CE, except for the periods with annual resolution such as from 300 to 400 CE and around the SPE spikes.
Modeling method
The global production rate of cosmogenic radionuclides
The global production rate of cosmogenic radionuclides depends on the cosmicray flux coming to the Earth which is modulated by solar activity (i.e., solar shielding of GCRs in the heliosphere) and the shielding effects of the geomagnetic field (Masarik and Beer 1999). Concentrations of stable and radioactive nuclides in meteorites and terrestrial archives suggest that, on time scales of about 0.5 million years, the GCR flux outside the heliosphere has remained constant within ± 10% over the past ∼ 10 million years (Wieler et al. 2013). Therefore, the assumption of a constant local interstellar GCR spectrum outside the heliosphere is usually made for any reconstruction of solar and geomagnetic field activity using cosmogenic radionuclides. The global production rate of cosmogenic radionuclides (Q [atoms cm^{−2} s^{−1}]) can then be modeled as a function of solar modulation potential (ϕ [MV]) and GDM denoted as M [10^{22} A m^{2}]:
where t represents time. In this study, we employed the tabulated function established in Kovaltsov et al. (2012) for the ^{14}C production rate (Fig. 3). Since the tabulated function was published with discrete data, we approximated it with a polynomial function (details in section 1 of the Additional file 1). This allowed us to work with continuous values of ϕ and GDM.
The function established by Kovaltsov et al. (2012) was based on the LIS model from Usoskin et al. (2005). According to Herbst et al. (2017), ϕ from one LIS can be converted to ϕ in another LIS by means of linear regression functions. We then used the following regression function to convert from ϕ_{US05} to the more recent LIS model by Herbst et al. (2017), i.e., ϕ_{HE16}:
Theoretical ^{14}C production rates and the normalized ^{14}C data need to be connected via a normalizing constant. This constant can be estimated through a comparison of the ^{14}C production rate data with theoretically expected ^{14}C production rates inferred from independent records of solar modulation potential and GDM. As mentioned above, the GDM from 1840 CE provided by COVOBS. × 2 is suitable for our purpose. On the other hand, ϕ_{HE16} provides a record of solar activity inferred from instrumental data. However, it only overlaps with the radiocarbon data for a short period from 1939 to 1950 CE. Therefore, we rely on the solar modulation record inferred from the GSNs as an alternative.
Figure 4 shows the global ^{14}C production rate estimated by combining the mean of ϕ_{GSN} and the mean GDM predicted by COVOBS. × 2 from 1840 to 1950 CE. The significant 11year solar variation can be observed in the estimated mean production rate, while the variation has a lower amplitude in the processed ^{14}C data (Fig. 4). The main reason is the smoothening which occurred during the construction of the average IntCal20 ^{14}C record (Heaton et al. 2020; Reimer et al. 2020) and the fact that the 11year cycle variability is strongly dampened and close to the detection limit in treering based reconstructions of the atmospheric ^{14}C concentration (Brehm et al. 2021). The smoothening of the ^{14}C data will have consequences for our modeling approach later on. Assessing this smoothening with a moving average filter showed that a 9year moving average version of the theoretical ^{14}C production rate (Fig. 4) approximates best the variations in the ^{14}C data (r = 0.76). A normalizing constant was then computed by comparing the averages of the estimated ^{14}C production rate filtered with a 9year moving average filter, and the averages of the ^{14}C production data for the period of overlap. The model uses this normalizing constant to connect the input ^{14}C data to the global production rates generated with Eq. 1.
A Bayesian approach for sampling the past solar modulation potential and geomagnetic dipole moment
Consider a model with a set of parameters (θ) and observed data (y); a set of parameters that fit with the observed data can be found using Bayes’ theorem (Gelman et al. 2004):
Within the Bayes’ framework, p(θy) is the unnormalized posterior distribution of the parameters after considering the observed data. p(yθ) is the distribution of the data conditional on θ. It is also called the likelihood function if it is treated as a function of θ. p(θ) is the prior distribution of the model parameters before any observation. Although the posterior distribution p(θy) cannot always be solved analytically, Eq. 3 allows for an approximation by generating samples from the posterior distribution via different sampling methods. This method of statistical inference or Bayesian inference allows us to incorporate our additional knowledge into the prior distribution of the parameters. Moreover, the parameters’ distribution could be updated continuously as more observations become available. This is particularly useful in cases where the sample size is not fixed and/or when the users want to incorporate additional information.
We employed the Hamiltonian Monte Carlo (HMC) method which utilizes the Hamiltonian dynamic simulation to efficiently generate samples from the posterior distribution (Neal 2011). We developed and executed our model via Stan (Carpenter et al. 2017), a probabilistic programming language for statistical modeling and highperformance statistical computation. In addition, we used the NoUTurn Sampler (NUTS), an extension of HMC developed by (Hoffman and Gelman 2014) to sample from the posterior distribution of solar variations and GDM. NUTS provides an autotuning of difficult and highly sensitive parameters of the HMC sampler.
If the solar modulation potential (ϕ) and the GDM (M) are considered as parameters in formula 3, the Bayesian approach can be used to find ϕ and M that fit with the observed global production rate (Q):
and if ϕ and M are independent, the formula can be rewritten as:
where the likelihood function p(Qϕ,M) can be established based on the global production rate function of Kovaltsov et al. (2012) (Fig. 3). This Bayesian method allows us to input our information on the characteristics of the Sun and the geomagnetic field intensity via the prior distribution p(ϕ) and p(M). Combining this information with the observed production rate data offers the possibility to reduce the uncertainty and improve the reconstructions.
Setting up the prior distributions
Prior distribution of solar variability
The prior distribution of solar activity was set up using the framework of a Gaussian process (GP) described by Rasmussen and Williams (2006):
where ϕ(t) represents the solar modulation potential at time t. t and t' are separated by a time difference r. m_{ϕ}(t) is the mean function and k_{ϕ}(t,t') is the covariance function of ϕ(t). The mean function and the covariance function are given by:
where E represents the expected values. The GP is a generalization of the multivariate Gaussian probability distribution (Rasmussen and Williams 2006), where each point t is described by a mean and a joint Gaussian distribution with the surrounding points. Consequently, every point/event in time is influenced by (correlated to) the data before and afterwards. Characteristics of the correlation are determined by the covariance function. The covariance matrix, generated using a specific covariance function (discussed later), is the collection of vectors defining the correlation of every point in time with the surrounding points. In other words, the covariance matrix includes the information on the characteristic timescales and variances of the physical processes that we aim to reconstruct.
The most common covariance function within the machine learning field is the squared exponential (SE) (Rasmussen and Williams 2006). The covariance (k_{SE}) and spectral density (S_{SE}) of the SE covariance function have the forms:
where D is the dimensionality, σ^{2} is the signal variance and l is the lengthscale that determines how quickly the correlation diminishes with time. s in Eq. 10 represents frequencies. Additional file 1: Fig. S3 shows an example of the covariance k_{SE} (r) as a function of the input distance r in time.
In this study, a new covariance function for the solar variations was created by adding two SE covariance functions with different characteristic timescales:
One SE covariance function simulates the observed shortterm variations (e.g., 11year cycle), while the other one simulates the observed centennial variations (e.g., 88–100 years) of the Sun. This combined covariance function was established based on the variations of the solar modulation parameters (i.e., ϕ_{GSN}) which is the longest observational record of solar activity. The record shows solar activity for the last ~ 400 years which was dominated by the 11year cycle and the centennial variations (Fig. 1). The short length of ϕ_{GSN} is a weakness of the model since the record is not long enough to capture possible quasiperiodicities of ~ 400 years to up to ~ 2400 years in solar variations (e.g., Bond et al. 2001; Snowball and Muscheler 2007; Usoskin et al. 2016; Dergachev and Vasiliev 2019). However, we want to avoid including longterm cycles inferred from the radionuclide records in our prior information which would lead to circular reasoning as the model is applied to radionuclides. Moreover, solar variations inferred from radionuclide records on longer time scales are rather ambiguous since it is challenging to completely eliminate influences from geomagnetic field variations, transport and deposition processes for ^{10}Be and carbon cycle effects on ^{14}C (Vonmoos et al. 2006; Snowball and Muscheler 2007). In addition, we lack a longer direct observational record of solar activity that could help us to obtain a better constrained prior for solar variability on timescales longer than the GSN record. Nevertheless, the GSN record still shows indications of the 200year cycle such as the Maunder minimum, a period with an almost complete lack of sunspots, which can be considered as an expression of the 200year cycle (Fig. 1). The millennialscale cycles can possibly be characterized as bundling of largeramplitude centennialscale variations followed by periods of weaker centennialscale variability, and, in such cases, the proposed covariance function can reproduce the millennialscale variations to some extent.
The histograms in Fig. 5 show that ϕ_{GSN} has a positive distribution (i.e., the adiabatic energy loss of GCR cannot be negative) that is skewed toward higher values. Therefore, the symmetrical Gaussian distribution, which also allows for negative values, implied by Eq. 6 is not an appropriate approximation of ϕ. Moreover, the polynomial approximation (Additional file 1: Fig. S2) of the ^{14}C production rate function starts to become unrealistic and produces negative values at ϕ smaller than 362 MV which would be problematic to the modeling process. For ϕ within the range of − 362 to 0 MV, which also implies unphysically negative shielding, the approximation produces unrealistically large values of ^{14}C production rate which would then mostly be rejected by the Bayesian sampler. However, the sampling process would be inefficient if the model has to frequently reject negative values of ϕ. Thus, we considered modeling ϕ using a lognormal distribution which is also a skewed positive distribution:
where μ_{log(ϕ)} represents the mean and σ^{2}_{log(ϕ)} represents the variance of the log transformation of ϕ. A disadvantage of using a lognormal distribution is that the model will occasionally generate extremely high values of ϕ due to the nonsymmetrical characteristic (Fig. 5a). However, the proposed ϕ will be rejected by the model when it is unrealistically high and cannot be explained by the radionuclide production rate. Another problem of the lognormal distribution is a bias toward lower values of ϕ as demonstrated by the histograms. Consequently, values above the mean of ϕ_{GSN} will be generated with a lower probability. A solution for this is to add a constant (c [MV]) to ϕ_{GSN} before fitting with a lognormal distribution:
After sampling from this distribution, we exponentiate and subtract c to obtain ϕ. This approach allows the model to generate a more flexible distribution agreeing well with the ϕ_{GSN} distribution (Fig. 5). This minimizes the bias toward lower values of ϕ at the cost of allowing for negative values of ϕ (i.e., as low as minus c) with a low prior probability. We assessed the fitted distribution for the case of c equal to 100, 200 and 300 MV (Additional file 1: Fig. S4) and decided to choose 200 MV as this value shows a good balance between the pros and cons. The fitted distribution to log(ϕ + c) with c equal to 200 MV is shown in Fig. 5. Parameters of the fitted lognormal distribution such as mean and variance were assessed using the method of maximum likelihood estimation. In summary, the prior distribution allows for negative values of solar modulation as low as − 200 MV but with a low probability in exchange for a better model performance with less bias toward lower values of ϕ. Negative solar modulation values, while unphysical, could be generated associated with the biases in radionuclide data (e.g., climate impact), extreme spikes/enhancement in the radionuclide records (e.g., SPEs), or simply the data uncertainties. Data uncertainties are included in the model and we assume that climate biases are minor for the ^{14}C production rate. However, SPEs are not included in our model and, therefore, we removed the known SPE production peaks (i.e., the 774/775 and the 993/994 peaks) from the ^{14}C data. In addition, Fig. 5 shows that the prior distribution (green line) still slightly underestimates the probability of solar modulation from around 500 to 1000 MV. This could still lead to a slight bias toward lower values of solar modulation. However, the posterior distribution of solar activity will ultimately be evaluated and selected based on the radionuclide data. Therefore, a small inclination toward low solar modulation values of the prior distribution will not bias the results.
We then replaced Eq. 6 with:
where ϕ_{c}(t) = ϕ(t) + c. The mean function and the covariance function are adjusted accordingly:
The mean of our proposed prior distribution (m_{log(ϕc)}) for the modeling period was equal to the mean of the fitted lognormal distribution.
The shortterm and longterm variations of the Sun were defined in Eq. 11 by the characteristic lengthscale (l_{short}, l_{long}) and the signal variance (σ^{2}_{short}, σ^{2}_{long}). These parameters were determined using ϕ_{GSN}, particularly the power spectrum of log(ϕ_{GSN}) (Fig. 6a). In other words, we investigate how log(ϕ_{GSN}) behaves in the frequency range and adjust our prior to resemble it. The shortterm variations are reflected as a bump and changes in the slope of the power spectrum around the 6 to 16year period. The centennial variation could be observed for periods longer than 55 years, but the changes in the slope were not as strong as the shortterm variation. First, we determined l_{short} and σ^{2}_{short} by tuning our covariance function for the period from 10 to 12 years (highlighted in Fig. 6a). This ensures that our prior captures the shortterm variations which are most prominent for period lengths from 10 to 12 years. We then tuned l_{long} and σ^{2}_{long} for the period from 50 to 136 years where the transition in power occurs (highlighted in Fig. 6a). Details of the parameterization process are outlined in section 2 of the Additional file 1.
The spectral density of the combined covariance function with tuned parameters is shown in Fig. 6a. The power decreases stepwise with a constant period between the steps as a result of the combination of our two SE covariance functions. As expected, the combination of the two tuned covariance functions does not fully simulate the bumplike structure in power generated by the shortterm variation of the Sun around a 10 to 12year period. The power was instead raised to a higher level before and after the bump. A fixed periodic signal could be introduced to simulate the bumplike property as a peak in the power spectrum. However, this would also limit the shortterm variations to a narrowly defined cycle as, for example, an 11year cycle. We here chose to apply a more relaxed prior since the shortterm variations of the Sun vary around the 11year timescale rather than being a cycle with constant frequency (FriisChristensen and Lassen 1991; Solanki et al. 2002; Petrovay 2010; Brehm et al. 2021). Moreover, the shortterm variations could have changed further back in time. Overall, despite this drawback, we still chose the SE covariance function because the flexibility in the range of the shortterm variations allows for a variable frequency for the “quasi” 11year cycle. Therefore, the solar cycle can be determined by the data instead of being imposed by the prior.
Figure 6b shows a comparison of the power spectra of ϕ_{GSN} and solar modulation potential realizations that were randomly generated with our tuned covariance function for the same time period. As expected, the covariance function generates shortterm variations that are not fixed to, but rather vary around, the 11year cycle. The transition in the power spectrum of ϕ_{GSN} from 50 to 136 years is well simulated. The covariance function generates curves with higher power than ϕ_{GSN} for periodicities with 16–40 years cycle lengths. This is a drawback of the SE covariance function that cannot be avoided with this rather simple approach, but on the other hand it is necessary to generate high enough power to capture the shortterm variations. Variations on timescales shorter than 4 years can be observed in the power spectrum of ϕ_{GSN}. We interpret these variations as not relevant for our radionuclide data interpretation and decided to exclude them in our parameterization. Therefore, it is not a problem when the power spectrum of our generated solar modulation potential diminishes rapidly for periodicities shorter than 4 years.
We generated 1000 realizations of ϕ from 1 to 1938 CE using Eq. 14 to simulate our prior distribution of solar activity. We found that 1000 realizations were enough to capture the main aspect of our covariance matrix, mainly around the diagonal. The covariance matrix and hence the realizations were connected to the present solar activity of ϕ_{HE16}. Every realization was binned (i.e., taking an average of every 10 years) prior to 1600 CE to have the same time resolution as the processed ^{14}C data. Post 1600 CE the realizations were smoothened with a 9year moving average filter to match the treatment of the ^{14}C data, as discussed in Sect. 3.1. A new adapted covariance matrix for solar variations over the period 1 to 1938 CE was computed from these 1000 processed realizations to account for the lower resolution and smoothening. The final prior distribution of solar activity was estimated via 1000 realizations generated using the adapted covariance matrix (Fig. 7a). Again, we tested and found that 1000 realizations were enough to represent the adapted covariance matrix. The new solar realizations have comparable variations to the 9year moving average version of ϕ_{GSN} during the same period (Fig. 6c). The approach described here allows us to directly adjust/adapt our covariance matrix without having to adjust/reparameterize the covariance function. Thus, it will be helpful for modeling radionuclide data with different temporal resolution or smoothening when applied to longterm radionuclide records.
Prior distribution of geomagnetic field intensity
We set up the prior distribution for GDM following Bouligand et al. (2016). We simplified the approach using just the axial component to approximate the GDM. This was justified because the axial dipole is the component that dominates the geomagnetic shielding of galactic cosmic rays (Masarik and Beer 1999). The covariance function for changes in the axial component is given by:
with ξ^{2} = χ^{2} − ω^{2}. χ and ω are parameters representing frequencies. σ^{2} and r, again, symbolize signal variance and difference in time. The power spectral density of the covariance function is as follows:
The power spectrum (P) has an arc shape with the power decreasing with increasing frequency. It is often divided into several frequency ranges which can then be approximated locally with various spectral indices (p) (Bouligand et al. 2016; Hellio and Gillet 2018):
At very low frequency where p ≅ 0, the spectrum is almost flat indicating that it has the largest power at longterm periods such as periods longer than 50,000 years as shown in Additional file 1: Fig. S5. The power decreases faster at shorter periods (i.e., larger spectral indices). The decrease in power spectrum is simulated by changes in the slope which are determined by the cutoff frequencies T_{s} and T_{f}. The cutoff frequencies indicate the time periods where transitions into steeper slopes occur and their formulas are given by Bouligand et al. (2016):
χ and ω are then chosen to achieve the desired cutoff frequencies.
A recent geomagnetic field model COVLAKE spanning the last 3000 years has T_{s} ~ 100,000 years and T_{f} ~ 60 years (Hellio and Gillet, 2018). The model is based on measurements of the magnetic field directions and intensity stored in archeological artifacts, igneous rocks and sediment records. The power spectral densities for the axial component of COVLAKE are shown in Additional file 1: Fig. S5 and the GDM provided by COVLAKE is shown in Fig. 10. The COVOBS. × 2 model (Additional file 1: Fig. S5) has the same T_{s} but higher T_{f} (~ 235 years). This earlier transition (at longer time period) in the slope resulted in lower power for variations on timescales shorter than 200 years.
We tested generating a prior distribution of GDM using T_{s} and T_{f} from COVLAKE and COVOBS. × 2 models (results are shown in Additional file 1: Fig. S6). However, comparisons to the reconstructed GDM based on COVLAKE suggest the prior distribution is rather conservative. We also compared our prior distribution with pfm9k.1b another major reconstruction for the last 2000 years (Nilsson et al. 2014; Muscheler et al. 2016). pfm9k.1b provides a low temporal GDM reconstruction (300–400 years) over the Holocene based on almost the same underlying dataset as COVLAKE (COVLAKE was extended to include additionally new sediment and archeological intensity data). The GDM reconstructed by pfm9k.1b over the last 2000 is shown in Fig. 10. The GDM indicated by pfm9k.1b was mostly above and outside the prior distribution (Additional file 1: Fig. S6). Too conservative prior distributions could result in a bias toward values of GDM lower than the range seen in the previous GDM models. Therefore, we adjusted the parameters to widen our prior distribution (Additional file 1: Fig. S6c) and allow for more variability in the prior for GDM. We used a lower T_{s} of 50,000 years, a higher T_{f} of 433 years and also a larger signal variance. The variations for timescales between 100 and 10,000 years of our prior are larger compared to the prior used for COVOBS. × 2 (Additional file 1: Fig. S5). For variations shorter than 100 years, our prior agrees with the prior used for COVOBS. × 2. The parameters of COVOBS. × 2, COVLAKE and from this study for the axial component are shown in Additional file 1: Table S1. The variations of the axial dipole are insignificant at timescales shorter than 20 years. Therefore, the prior distribution of the GDM can be used directly without any adjustment (e.g., smoothening) of the covariance matrix. In addition, the prior distribution of GDM was connected to present GDM from 1939 to 2020 CE predicted by COVOBS. × 2. We generated 1000 realizations of GDM simultaneously with ϕ to estimate the prior distribution (Fig. 7).
Results of palaeomagnetic field models could be used to further constrain the prior distribution. This would reduce the reconstruction uncertainty for periods where past GDM variations are well constrained, but would introduce uncertainties associated with the chosen palaeomagnetic field model. Disagreement in GDM reconstructed by different palaeomagnetic field models would lead to discrepancies in solar activity reconstructions. In addition, this also defeats our purpose of being independent of GDM models and providing a radionuclidebased reconstruction for GDM. Therefore, we did not further constrain our prior distribution based on results of palaeomagnetic field models.
Evaluation of the proposed solar activity and GDM
The samples of ϕ and M drawn from the prior distribution were evaluated via the likelihood function:
where Q_{t} is the data vector of the observed global production rates and σ^{2}_{Q,t} is the vector representing uncertainty (i.e., variance) of the observed data. ϕ_{t} and M_{t} are the vectors of the solar modulation potential and GDM, proposed by the Bayesian model as the solution for Q_{t}. The link between Q_{t} and ϕ_{t} and M_{t} [i.e., f(ϕ_{t},M_{t})] was established based on the global production rate function in Kovaltsov et al. (2012) (see Fig. 3 and section 1 in the Additional file 1). In summary, the Bayesian model combines our knowledge about the parameters (i.e., incorporated in the prior distributions) with the additional constraints provided by the observations (here, the global production rate) to yield a new (posterior) distribution of the parameters.
The prior distributions allow for an overlap of geomagnetic field and solar variability, which is the main challenge our method addresses. However, this leads to the fact that the reconstruction for geomagnetic field and solar variability on timescales significantly longer than 200 years is to some extent ambiguous since the prior distribution of solar activity was established based on variations observed in the 400yearlong GSN record. Nevertheless, the prior distribution is based on flexible SE covariance functions so that the choice of timescales does not prevent the model from finding longer periodicities, if the data requires them. The power spectrum of the tuned covariance function in Fig. 6 is essentially flat for longer timescales, but presumably the actual power spectrum decreases at very long periods. The model will compare and decide if the variations in the radionuclide record at timescales longer than 200 years can or cannot be explained by geomagnetic field variations (characterized in the prior distribution) and, if not, the model will likely consider those as solar variations. Overall, the model has larger uncertainties for variations on timescales longer than 200 years, but the prior information allows for some separation of solar and geomagnetic field influences also on these timescales. This is an advantage of the model over a simple bandpass frequency filter in disentangling solar and geomagnetic field variations on timescales longer than 200 years.
We also evaluate the correlation coefficient (r_{ϕt,Mt}) between the proposed curves of ϕ and GDM:
We set the mean (μ_{r}) and standard deviation (σ_{r}) of the correlation coefficient equal to 0 and 0.01, respectively. This ensures the independence of the reconstructed solar activity and geomagnetic field strength which was the initial assumption of the Bayesian approach (Eq. 6). It is worth mentioning that we tested different values of σ_{r} and chose 0.01 as the value providing the best independence constraint. σ_{r} larger than 0.01 would result in ϕ and GDM correlating more strongly than we expect (we do not expect a link between solar activity and GDM variations), while σ_{r} lower than 0.01 would be too severe a constraint. The model will then reject the majority of the proposed samples albeit some low correlation can occur just by coincidence. It is also worth mentioning that some chance correlation between ϕ and GDM is more likely if we investigate a short period of time (e.g., shorter than 1000 years), especially if both processes exhibit a longterm trend over the investigated period.
Testing the model with synthetic data
Generating synthetic data
Here we aim at testing how well the model can recover solar activity and GDM from ^{14}C production rates that were calculated from these synthetic solar and geomagnetic field records. The synthetic data were generated from the prior distribution of ϕ and GDM and therefore it contained only variations that were included in the model. Therefore, the model was tested in a control scenario where the data did not contain unknown patterns. This is an important step in model validation before running the model with a real dataset with solar and geomagnetic field variations that are unknown.
Firstly, we randomly generated a realization of solar activity (ϕ') and GDM (M') using the prior distribution. These realizations were considered as the reference (i.e., “true”) values which, after adding the assumed uncertainties, the Bayesian model was challenged to reconstruct. Figure 7a and b shows the prior distributions (50 random realizations and a 2sigma envelope of a thousand realizations), and the reference ϕ' and M'. The differences between solar variations with annual resolution and 10year resolution are visible in the solar realizations in Fig. 7a. Before 1600 CE, only variability on longer timescales such as centennial variations is left but no 11year variations as these shortterm variations were largely removed due to the low sampling resolution. After 1600 CE, the 11year solar cycle can be observed. Occasionally, the model generated ϕ values larger than 2000 MV which is significantly larger than the values of ϕ_{HE16}. This is a consequence of the lognormal distribution as discussed above. However, most of the values were below 1300 MV as indicated by the 2sigma envelope (black dashed line). The prior distribution of GDM (Fig. 7b) shows insignificant shortterm variations and larger millennial variations. A synthetic ^{14}C production rate (Fig. 7c) resulting from ϕ' and M' was computed. We then resampled the synthetic production rate (orange circles with associated 1sigma errors in Fig. 7c) assuming it followed a normal distribution with a standard deviation similar to the standard error of our inferred ^{14}C production rate from IntCal20. This allows us to test our model with realistic levels of uncertainty. The standard error of the ^{14}C production rate (from 1 to 1950 CE) ranges from 2.1% to 6.7%. We assumed the worstcase scenario and used 6.7% uncertainty to resample the synthetic ^{14}C production rate.
Assessing model performance
For a given model parameter (θ_{i}), we estimate the posterior mean (\({\widehat{\theta }}_{i}\)) and variance (\({\widehat{\sigma }}_{{\theta }_{i}}^{2}\)) directly from the Markov chain Monte Carlo (MCMC) samples:
with N being the sample population. In this study, we generated a thousand MCMC samples (N = 1000) to estimate the posterior distribution of ϕ and M. We assess our model performance using two diagnostics following (Hellio and Gillet 2018; Nilsson and Suttie 2021), the normalized dispersion (Φ_{θi}) and the normalized error (Ψ_{θi}). The normalized dispersion equals to the standard deviation of the posterior divided by the standard deviation of the prior:
The squared normalized dispersion (Φ^{2}_{θi}), also known as the shrinkage factor, measures the amount of information contributed by the observations to the prior distribution. Φ^{2}_{θi} equal to 1 indicates that no information was added. The second parameter, the normalized error, is equal to the absolute difference between the posterior mean and the reference value (θ_{i}') (i.e., the selected ϕ' and M', blue lines in Fig. 7a and b) normalized by the standard deviation of the posterior:
The normalized error measures the accuracy of the prediction with Ε[Ψ_{θi}] = 1 indicating that the reference values are mainly within the uncertainty of the prediction values. The root mean squared (RMS) values of Φ_{θi} and Ψ_{θi} for ϕ and M are shown in Additional file 1: Table S2.
The posterior distribution of ϕ was well estimated and constrained as indicated by a low value of Φ_{RMS} (Additional file 1: Table S2). Figure 8a shows that the posterior distribution of ϕ (i.e., the 2sigma uncertainty envelope) is significantly smaller than the 2sigma envelope of the prior. On the other hand, less information was added to the prior distribution of GDM as indicated by Φ_{RMS} equal to 0.73. Figure 8b shows that the posterior distribution of GDM was almost the same as the prior distribution in the recent period, for example after 1500 CE. This suggests that the ^{14}C production rate could not help improve the estimation of GDM for the recent period. The posterior distribution of GDM was smaller than the prior distribution further back in time indicating that GDM was better constrained during the earlier periods. On the other hand, the values of Ψ_{RMS} just below 1.0 (Additional file 1: Table S2) demonstrate that the original solar activity and GDM (Fig. 8a and b) were mainly within the reconstruction uncertainty.
Another important validation is the ^{14}C production rate generated by the recovered ϕ and M. If the model performed well, the recovered ^{14}C production rate will be within uncertainty of the reference ^{14}C production rate. Figure 8c compares the reference ^{14}C production rate with mean of the recovered ^{14}C production rate. The reference values are well within the posterior 2sigma uncertainty and mostly agree with the recovered production rate. Figure 8d shows the histogram of the modeldata residuals (i.e., the differences between individual posterior realizations and the synthetic ^{14}C data, normalized by the data uncertainties). The symmetrical distribution of the modeldata residuals resembles a standard normal distribution. This indicates that the model does not lead to a bias toward a low or high value of ^{14}C production rate, and there are no extreme values or outliers in the recovered production rate. Overall, our model performed well with the synthetic data. The solar variations and GDM can be recovered from the ^{14}C production rate corrupted by a realistic level of measurement uncertainty.
We also estimated the upper limit of data uncertainty by corrupting the synthetic ^{14}C production rate with various uncertainty levels from 15 to 70% (Additional file 1: Fig. S7). The reference solar variation (ϕ’) was not fully recovered by the model when the data uncertainty was at 20% and larger. An increase in the data uncertainty also resulted in larger Φ_{RMS} and reconstruction uncertainty which indicates that the model’s ability to constrain past solar activity decreased. In summary, we conclude that data uncertainty of less than 20% is required for a good model performance.
Reconstruction of solar and geomagnetic field activity from ^{14}C data
Bayesian reconstruction
We applied the Bayesian model to recover solar activity and GDM variations from the processed ^{14}C production rate data inferred from IntCal20. Results of the Bayesian ^{14}Cbased reconstruction are shown in Fig. 9. Shortterm variations (from decadal up to ~ 300 years) in the ^{14}C data were mostly attributed to solar variations. For example, the increase and then decrease in ^{14}C production rate between ~ 1350 CE and 1600 CE was interpreted by the model as solar induced. Meanwhile, the longterm increase in the production rate since 1 CE was attributed to the gradual decrease in GDM. The posterior distribution of past geomagnetic field activity was constrained better (i.e., smaller uncertainty range) by the real ^{14}C data than by the synthetic ^{14}C data. One reason was that the synthetic dataset has larger uncertainties since, for testing the model, we included the maximum value of the realistic uncertainty. Figure 9c compares the ^{14}C production rate generated from the reconstructed ϕ and GDM with the input ^{14}C data. They agree well (within the posterior uncertainty) which indicates that the model is able to find the combinations of ϕ and M that are consistent with the ^{14}C data.
Comparing the Bayesian reconstruction with the conventional reconstruction method
Figure 10b compares our Bayesian ^{14}Cbased reconstruction of GDM with published reconstructed GDMs based on pfm9k.1b and COVLAKE. The smaller uncertainty of COVLAKE during the last 2000 years compared to pfm9k.1b is mainly due to the more conservative assumptions made regarding age uncertainties in pfm9k.1b and to a lesser degree related to the different model strategies. In general, all of the reconstructions demonstrate a decreasing trend over the last 2000 years. Disagreements among the reconstructions can be observed especially from around 850–1750 CE. The COVLAKE models indicate a small dip in GDM with an average of 8.85 (± 0.12) x10^{22}Am^{2} from 850 to 1250 CE. Meanwhile, the pfm9k.1b models suggest a small peak with an average of 9.74 (± 0.38) × 10^{22}Am^{2}. Our Bayesian ^{14}Cbased reconstruction shows a gradual decrease in GDM from 9.81 (± 0.33) × 10^{22}Am^{2} to 9.23 (± 0.32) × 10^{22}Am^{2} over the same period. From 1250 to 1750 CE, the Bayesian ^{14}Cbased reconstruction indicates a mean GDM of around 8.75 (± 0.26) × 10^{22}Am^{2} which is 0.33 × 10^{22}Am^{2} and 0.72 × 10^{22}Am^{2} lower than COVLAKE and pfm9k.1b, respectively. In addition, the Bayesian ^{14}Cbased reconstruction indicates a mean GDM of 10.50 (± 0.40) × 10^{22}Am^{2} prior to 500 CE which is 0.50 × 10^{22}Am^{2} higher than the mean GDMs based on pfm9k.1b and COVLAKE during the same period.
We now compare solar activity reconstructed from the ^{14}C data via our Bayesian model (ϕ_{Bayesian}) with the conventional reconstruction method described in Muscheler et al. (2016). This conventional method involves conducting a larger number (e.g., often a thousand) of Monte Carlo simulations. In each simulation one of the realizations of ^{14}C production rate is randomly selected and combined with one of the randomly selected realizations from a GDM model via the ^{14}C production function. The mean of a thousand simulations of past solar activity reconstructed from the ^{14}C data and pfm9k.1b models (ϕ_{pfm9k.1b}) or COVLAKE models (ϕ_{COVLAKE}) are shown in Fig. 10a. The solar reconstructions presented here should be the most uptodate for the last 2000 years based on ^{14}C since we combined the method of Muscheler et al. (2016) with an updated version of ^{14}C data from IntCal20 and the latest geomagnetic field reconstructions.
The different solar reconstructions agree mostly on shortterm variations. Disagreements between the longterm solar activity variations can be observed where the reconstructed GDMs start to deviate from each other (Fig. 10b). Before 500 CE, ϕ_{Bayesian} is on average 496 (± 63) MV, which is about 40 MV lower than the other two reconstructions. From 850 to 1250 CE, ϕ_{Bayesian} is 466 (± 44) MV on average which is similar to ϕ_{pfm9k.1b} but about 70 MV lower than ϕ_{COVLAKE}. From 1250 to 1750 CE, the average of ϕ_{Bayesian} is 354 (± 30) MV which is 20–50 MV larger than ϕ_{COVLAKE} and ϕ_{pfm9k.1b}, respectively. This trend is the opposite in the reconstructed GDMs and it illustrates the influences of the selected GDM on the longterm reconstruction of solar activity. However, ϕ_{Bayesian} still agrees with ϕ_{pfm9k.1b} and ϕ_{COVLAKE} within the reconstruction uncertainties. Moreover, GDM reconstructions based on pfm9k.1b and COVLAKE are mostly within the reconstruction uncertainty of the Bayesian ^{14}Cbased GDM. These results suggest that the reconstructions of solar activity and GDM from ^{14}C data of our Bayesian model are realistic over the last 2000 years.
Figure 11 compares the solar activity reconstructions with the solar modulation potential based on GSN filtered with a 9year running average filter (ϕ_{GSN,filtered}, orange line). The centennial variations of the recovered solar activity from different reconstructions generally agree well with ϕ_{GSN,filtered}. However, differences outside the inferred uncertainty can be observed around 1700–1800 CE. ϕ_{GSN,filtered} suggests solar activity levels, on average, 110–150 MV higher than all of the reconstructions from the ^{14}C data. These disagreements could be due to the limitation of our Bayesian method, as discussed in Sect. 3, such as the small bias toward lower solar modulation values of our prior distribution. However, since our Bayesian reconstruction agrees well with conventional solar reconstructions (i.e., ϕ_{pfm9k.1b} and ϕ_{COVLAKE}), we argue that the disagreements were more likely due to an underestimation of uncertainties of the ^{14}C data and ϕ_{GSN} inferred from the GSN record. We used the Northern hemisphere ^{14}C curves and there are differences to the Southern hemisphere records (Muscheler et al. 2007). It is also possible that the carbon cycle effects are not fully captured by the ^{14}C production rate calculation with the boxdiffusion model. Possible uncertainty in this calculation is hard to quantify but, in general, a good agreement is obtained by calculations with different carbon cycle models (Muscheler et al. 2007). The subtle changes in the carbon cycle that were not fully captured by the boxdiffusion model can be explored by adding such a component to future versions of the model and comparing the results from ^{14}C to ^{10}Be data. On the other hand, the uncertainty of the inferred ϕ_{GSN} was likely also underestimated. The standard error of the GSN data represented only the spread among different counting records (Svalgaard and Schatten 2016).
Figure 11 also shows that ϕ_{Bayesian} has a lower uncertainty compared to the two other reconstructions during the period where the model runs with annual resolution (e.g., after 1600 CE). Although the Bayesian ^{14}Cbased GDM has similar or even larger uncertainty than pfm9k.1b and COVLAKE (Fig. 10b), the reconstruction uncertainty of GDM did not directly affect the reconstruction uncertainty of solar activity in the Bayesian model. On the other hand, the uncertainties of ϕ_{pfm9k.1b} and ϕ_{COVLAKE} are a direct consequence of the uncertainty in GDM (see also Additional file 1: Fig. S8). This shows that the Bayesian model is able to reduce the solar activity reconstruction uncertainty via utilizing the knowledge of the differences in rates of change between variations of GDM and solar activity. The differences in variations are biggest after 1600 CE since significant shortterm solar activity variations are captured by the annual resolution. Consequently, the reconstruction uncertainty was reduced the most.
Comparing the Bayesian reconstruction with reconstruction using frequency filters
In the following, we illustrate the differences between the Bayesian ^{14}Cbased reconstruction and reconstruction by applying various frequency filters to separate solar and geomagnetic field influences.
Figure 12a illustrates solar activity reconstructed from variations of the ^{14}C data shorter than 250 years (ϕ_{HP,1/250}) and 600 years (ϕ_{HP,1/600}). The shortterm variations were extracted using two separate highpass frequency filters with cutoff frequencies of 1/250 years^{−1} and 1/600 years^{−1} to ensure reconstruction of solar activity on timescales of around 200 years and 500 years, respectively (details in section 5, Additional file 1). It should be noted that we applied the filters on the originally annual ^{14}C data (blue line in Fig. 2) and therefore differences at timescales shorter than 10 years can be observed between ϕ_{Bayesian} and ϕ_{HP,1/250}, and between ϕ_{Bayesian} and ϕ_{HP,1/600} prior to 1600 CE (Fig. 12a). The deviation of ϕ_{HP,1/250} from ϕ_{HP,1/600} and ϕ_{Bayesian} indicates possible solar variations at timescales between 250 and 600 years. For example, around 630–700 CE and around 1650–1720 CE, ϕ_{HP,1/250} is about 100 to 170 MV larger than mean ϕ_{Bayesian} and ϕ_{HP,1/600}. Possible solar variability at timescales significantly longer than 500 years could have caused ϕ_{HP,1/600} to be around 185 MV higher than mean ϕ_{Bayesian} around 1370–1600 CE. These differences are all outside the reconstruction uncertainty of the Bayesian model. Thus, the Bayesian model indicates longterm solar variations at timescales larger than 200 years and even larger than 500 years that were removed/cutoff by the frequency filters. This result also supports longterm solar variability patterns inferred from the radionuclide records in previous studies (Wagner et al. 2001; Snowball and Muscheler 2007; Adolphi et al. 2014).
Figure 12b shows GDM reconstructions from variations of the ^{14}C data longer than 600 years (M_{LP,1/600}), 1000 years (M_{LP,1/1000}) and 2000 years (M_{LP,1/2000}). We used three separate lowpass frequency filters with cutoff frequencies of 1/600 years^{−1}, 1/1000 years^{−1} and 1/2000 years^{−1} to extract the possible geomagnetic field signal (details in section 5, Additional file 1). The relatively large variations of M_{LP,1/600} for the last 2000 years suggest an uncorrected solar variability influence at timescales from 600 to 1000 years (Fig. 12b). This is also supported by disagreement between solar activity reconstruction by the Bayesian model and the frequency filters from 1250 to 1650 CE (Fig. 12a). In addition, M_{LP,1/1000} suggests a strongly decreasing trend especially after 1250 CE where our Bayesian ^{14}Cbased reconstruction as well as pfm9k.1b and COVLAKE suggest a rather constant and higher value of the GDM. This is potentially due to the fact that M_{LP,1/1000} still contains solar influences at millennial timescale plus influences of the end effects of the frequency filter. It is possible to normalize M_{LP,1/1000} to the average value of the Bayesian ^{14}Cbased reconstruction after 1250 CE. However, by doing so it will increase the value of M_{LP,1/1000} prior to 1250 CE to a higher level not supported by independent geomagnetic field models. Figure 12b also demonstrates another practical problem of the frequency filter as the method in general exhibits unreliable end effects. For example, variations before 200 CE and after 1750 CE cannot be recovered with M_{LP,1/1000} and M_{LP,1/2000}. For this to be possible, one would need the data to cover much longer periods than the actual reconstruction period. Therefore, it is difficult to connect the reconstructions to the present values.
In summary, Fig. 12 illustrates the advantages of the Bayesian model over simple frequency filters. Frequency filters are useful in general to partially remove GDM influences for studying shortterm solar activity variations in radionuclide records and vice versa for studying millennial variations of the GDM. However, they can never completely separate the longterm solar activity from the GDM changes as their variability ranges partly overlap at centennial and possibly millennial timescales. On the other hand, the Bayesian model can separate solar and GDM effects on these overlapping timescales despite the limitation of using the relatively short GSN record to constrain the prior information of solar activity.
Conclusion and outlook
We have introduced a Bayesian model that can separate solar and geomagnetic influences on radionuclide data using prior information on how solar activity and GDM vary through time. Here, we derived prior information on solar variability from a solar modulation reconstruction inferred from the group sunspot number record. The prior distribution of the GDM was adapted from previously proposed priors used in recent geomagnetic field models, i.e., COVLAKE and COVOBSx2.
Our model performs well with the synthetic test and can reconstruct the reference solar activity and GDM from a synthetic ^{14}C dataset corrupted with realistic measurement uncertainty. Applying the Bayesian model on the ^{14}C production rate data inferred from the IntCal20 calibration curve resulted in a reconstructed GDM which was gradually decreasing over the period of the last 2000 years. The Bayesian ^{14}Cbased GDM agrees mostly with independent reconstructions using the pfm9k.1b and COVLAKE geomagnetic field models. The solar activity reconstructed by the Bayesian model also agrees with conventional reconstructions where GDM influences were removed using pfm9k.1b and COVLAKE models. The solar activity reconstructed by the Bayesian model shows similar annual shortterm variations as the solar activity inferred from the GSN. There were, however, differences in the longterm variations outside the reconstruction uncertainty. This is probably due to underestimation of the uncertainty in the underlying ^{14}C data (e.g., carbon cycle effect) and GSN data. We also showed that our Bayesian model outperforms various simple frequency filters. The Bayesian model is able to disentangle solar and GDM influences on the ^{14}C record on timescales where their variability ranges partly overlap. In addition, a comparison between the Bayesian reconstruction with the reconstructions based on the frequency filters indicates that the Bayesian model can recover solar activity on timescales longer than 200 years.
In summary, the Bayesian model allows us to disentangle solar and GDM influences from the radionuclides data. This reduces the dependency of solar activity reconstructions on an independent GDM record and, therefore, can reduce the uncertainties associated with the independent GDM. Moreover, the Bayesian model can provide radionuclidebased GDM reconstructions which are valuable compliments to other GDM reconstructions.
The flexibility of the Bayesian framework outlined in this paper also allows for further improvements in the future. For example, independent GDM reconstructions could be incorporated into the model. This will help with the reconstruction of solar activity during periods where the GDM is well constrained. Moreover, more than one radionuclide dataset can be included into the model such as using several ^{10}Be records from different ice cores in addition to the ^{14}C data, or using a global compilation of ^{10}Be records. Including the different geochemical behavior of these radionuclides might help us to estimate the factors leading to the differences in longterm solar activity reconstructions based on ^{10}Be and ^{14}C as seen in Vonmoos et al. (2006). Prior information on the systematic influences such as changes in climate and carbon cycle could be incorporated into the model so that their signals can also be separated from the radionuclide data. This could further reduce the solar and geomagnetic field reconstruction uncertainties. In addition, the model can also be extended with ^{10}Be records from sediments. Records from these archives often contain nonproduction signals caused by the local processes and catchment conditions which can be separated via incorporating these processes into the model.
Availability of data and materials
The datasets used in this study (e.g., solar modulation potential, group sunspot number, global dipole moments) are available in the corresponding references/publication. The datasets generated during this study including the solar modulation inferred from group sunspot number, the production rate of ^{14}C inferred from IntCal20 and the synthetic dataset are available from the corresponding author on reasonable request.
Abbreviations
 GCRs:

Galactic cosmic rays
 GP:

Gaussian process
 GDM:

Geomagnetic dipole moment
 GSN:

Group sunspot number
 HMC:

Hamiltonian Monte Carlo
 LIS:

Local interstellar spectrum
 MCMC:

Markov chain Monte Carlo
 NUTS:

NoUTurn sampler
 RMS:

Root mean squared
 SPE:

Solar proton event
 SE:

Squared exponential
References
Adolphi F, Muscheler R, Svensson A, Aldahan A, Possnert G, Beer J, Sjolte J, Björck S, Matthes K, Thieblemont R (2014) Persistent link between solar activity and Greenland climate during the Last Glacial Maximum. Nat Geosci 7:662–666. https://doi.org/10.1038/NGEO2225
Beer J, Siegenthaler U, Bonani G, Finkel RC, Oeschger H, Suter M, Wölfli W (1988) Information on past solar activity and geomagnetism from 10Be in the Camp Century ice core. Nature 331:675–679. https://doi.org/10.1038/331675a0
Beer J, McCracken K, von Steiger R (2012) The cosmic radiation near earth. Cosmogenic radionuclides. Springer, Berlin, Heidelberg, pp 19–78
Bond G, Kromer B, Beer J, Muscheler R, Evans MN, Showers W, Hoffmann S, LottiBond R, Hajdas I, Bonani G (2001) Persistent solar influence on north Atlantic climate during the Holocene. Science 294(5549):2130–2136. https://doi.org/10.1126/science.106568
Bouligand C, Gillet N, Jault D, Schaeffer N, Fournier A, Aubert J (2016) Frequency spectrum of the geomagnetic field harmonic coefficients from dynamo simulations. Geophys J Int 207:1142–1157. https://doi.org/10.1093/gji/ggw326
Brehm N, Bayliss A, Christl M, Synal HA, Adolphi F, Beer J, Kromer B, Muscheler R, Solanki SK, Usoskin I, Bleicher N, Bollhalder S, Tyers C, Wacker L (2021) Elevenyear solar cycles over the last millennium revealed by radiocarbon in tree rings. Nat Geosci 14:10–15. https://doi.org/10.1038/s41561020006740
Carpenter B, Gelman A, Hoffman MD, Lee D, Goodrich B, Betancourt M, Brubaker MA, Guo J, Li P, Riddell A (2017) Stan: A probabilistic programming language. J Stat Softw 76(1–32):1. https://doi.org/10.18637/jss.v076.i01
Dergachev VA, Vasiliev SS (2019) Longterm changes in the concentration of radiocarbon and the nature of the Hallstatt cycle. J Atmos SolTerr Phys 182:10–24. https://doi.org/10.1016/J.JASTP.2018.10.005
FriisChristensen E, Lassen K (1991) Length of the solar cycle: an indicator of solar activity closely associated with climate. Science 254:698–700. https://doi.org/10.1126/SCIENCE.254.5032.698
Gelman A, Carlin JB, Stern HS, Rubin DB (2004) Bayesian data analysis, 2nd edn. Chapman & Hall CRC, London
Heaton TJ, Blaauw M, Blackwell PG, Bronk Ramsey C, Reimer PJ, Scott EM (2020) The IntCal20 approach to radiocarbon calibration curve construction: a new methodology using Bayesian splines and errorsinvariables. Radiocarbon 62:821–863. https://doi.org/10.1017/RDC.2020.46
Hellio G, Gillet N (2018) Timecorrelationbased regression of the geomagnetic field from archeological and sediment records. Geophys J Int 214:1585–1607. https://doi.org/10.1093/GJI/GGY214
Herbst K, Muscheler R, Heber B (2017) The new local interstellar spectra and their influence on the production rates of the cosmogenic radionuclides 10Be and 14C. J Geophys Res Sp Phys 122:23–34. https://doi.org/10.1002/2016JA023207
Hoffman MD, Gelman A (2014) The NoUturn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. J Mach Learn Res 15:1593–1623
Huder L, Gillet N, Finlay CC, Hammer MD, Tchoungui H (2020) COVOBS.x2: 180 years of geomagnetic field evolution from groundbased and satellite observations. Earth Planets Sp 72:160. https://doi.org/10.1186/s40623020011942
Kovaltsov GA, Mishev A, Usoskin IG (2012) A new model of cosmogenic production of radiocarbon 14 C in the atmosphere. Earth Planet Sci Lett 337–338:114–120. https://doi.org/10.1016/j.epsl.2012.05.036
Masarik J, Beer J (1999) Simulation of particle fluxes and cosmogenic nuclide production in the Earth’s atmosphere. J Geophys Res 104:99–112
Mekhaldi F, Muscheler R, Adolphi F, Aldahan A, Beer J, McConnell JR, Possnert G, Sigl M, Svensson A, Synal HA, Welten KC, Woodruff TE (2015) Multiradionuclide evidence for the solar origin of the cosmicray events of AD 774/5 and 993/4. Nat Commun 6:1–8. https://doi.org/10.1038/ncomms9611
Miyake F, Nagaya K, Masuda K, Nakamura T (2012) A signature of cosmicray increase in ad 774–775 from tree rings in Japan. Nature 486:240–242. https://doi.org/10.1038/nature11123
Miyake F, Masuda K, Nakamura T (2013) Another rapid event in the carbon14 content of tree rings. Nat Commun 4:1–6. https://doi.org/10.1038/ncomms2783
Muscheler R, Beer J, Kubik PW, Synal HA (2005) Geomagnetic field intensity during the last 60,000 years based on 10Be and 36Cl from the Summit ice cores and 14C. Quat Sci Rev 24:1849–1860. https://doi.org/10.1016/j.quascirev.2005.01.012
Muscheler R, Joos F, Beer J, Müller SA, Vonmoos M, Snowball I (2007) Solar activity during the last 1000 yr inferred from radionuclide records. Quat Sci Rev 26:82–97. https://doi.org/10.1016/j.quascirev.2006.07.012
Muscheler R, Adolphi F, Herbst K, Nilsson A (2016) The revised sunspot record in comparison to cosmogenic radionuclidebased solar activity reconstructions. Sol Phys 291:3025–3043. https://doi.org/10.1007/s112070160969z
Neal RM (2011) MCMC using Hamiltonian dynamics. In: Brooks S, Gelman A, Jones G, Meng XL (eds) Handbook of Markov Chain Monte Carlo. CRC Press, Boca Raton
Nilsson A, Suttie N (2021) Probabilistic approach to geomagnetic field modelling of data with age uncertainties and postdepositional magnetisations. Phys Earth Planet Inter 317:106737. https://doi.org/10.1016/j.pepi.2021.106737
Nilsson A, Holme R, Korte M, Suttie N, Hill M (2014) Reconstructing Holocene geomagnetic field variation: new methods, models and implications. Geophys J Int 198:229–248. https://doi.org/10.1093/gji/ggu120
Petrovay K (2010) Solar cycle prediction. Living Rev Solar Phys 7(1):1–59. https://doi.org/10.12942/lrsp20106
Rasmussen CE, Williams CKI (2006) Gaussian processes for machine learning. MIT Press, Massachusetts
Reimer PJ, Austin WEN, Bard E, Bayliss A, Blackwell PG, Bronk Ramsey C, Butzin M, Cheng H, Edwards RL, Friedrich M, Grootes PM, Guilderson TP, Hajdas I, Heaton TJ, Hogg AG, Hughen KA, Kromer B, Manning SW, Muscheler R, Palmer JG, Pearson C, van der Plicht J, Reimer RW, Richards DA, Scott EM, Southon JR, Turney CSM, Wacker L, Adolphi F, Büntgen U, Capano M, Fahrni SM, FogtmannSchulz A, Friedrich R, Köhler P, Kudsk S, Miyake F, Olsen J, Reinig F, Sakamoto M, Sookdeo A, Talamo S (2020) The IntCal20 northern hemisphere radiocarbon age calibration curve (0–55 cal kBP). Radiocarbon 62:725–757. https://doi.org/10.1017/RDC.2020.41
Schwabe SH (1844) Sonnenbeobachtungen im Jahre 1843. Dessau Astron Nachr 21:223
Siegenthaler U (1983) Uptake of excess CO2 by an outcropdiffusion model of the ocean. J Geophys Res 88:3599–3608. https://doi.org/10.1029/JC088iC06p03599
Snowball I, Muscheler R (2007) Palaeomagnetic intensity data: an Achilles heel of solar activity reconstructions. The Holocene 17:851–859. https://doi.org/10.1177/0959683607080531
Solanki SK, Schüssler M, Fligge M (2002) Secular variation of the Sun’s magnetic flux. Astron Astrophys 383(2):706–712. https://doi.org/10.1051/00046361:20011790
Svalgaard L, Schatten KH (2016) Reconstruction of the sunspot group number: the backbone method. Sol Phys 291:2653–2684. https://doi.org/10.1007/s1120701508158
Usoskin IG, Mursula K, Solanki SK, Schüssler M, Kovaltsov GA (2002) A physical reconstruction of cosmic ray intensity since 1610. J Geophys Res Sp Phys 107:13–21. https://doi.org/10.1029/2002JA009343
Usoskin IG, AlankoHuotari K, Kovaltsov GA, Mursula K (2005) Heliospheric modulation of cosmic rays: monthly reconstruction for 1951–2004. J Geophys Res Sp Phys 110:12108. https://doi.org/10.1029/2005JA011250
Usoskin IG, Gallet Y, Lopes F, Kovaltsov GA, Hulot G (2016) Astrophysics Solar activity during the Holocene: the Hallstatt cycle and its consequence for grand minima and maxima. A&a 587:150. https://doi.org/10.1051/00046361/201527295
Vonmoos M, Beer J, Muscheler R (2006) Large variations in holocene solar activity: constraints from 10Be in the greenland ice core project ice core. J Geophys Res Sp Phys 111:A10105. https://doi.org/10.1029/2005JA011500
Wagner G, Beer J, Masarik J, Muscheler R, Kubik PW, Mende W, Laj C, Raisbeck GM, Yiou F (2001) Presence of the Solar de Vries Cycle (∼205 years) during the Last Ice Age. Geophys Res Lett 28(2):303–306. https://doi.org/10.1029/2000GL006116
Wieler R, Beer J, Leya I (2013) The galactic cosmic ray intensity over the past 106–109 years as recorded by cosmogenic nuclides in meteorites and terrestrial samples. Sp Sci Rev 176:351–363. https://doi.org/10.1007/s1121401197699
Zheng M, SturevikStorm A, Nilsson A, Adolphi F, Aldahan A, Possnert G, Muscheler R (2021) Geomagnetic dipole moment variations for the last glacial period inferred from cosmogenic radionuclides in Greenland ice cores via disentangling the climate and production signals. Quat Sci Rev 258:106881. https://doi.org/10.1016/j.quascirev.2021.106881
Acknowledgements
We would like to thank Tim Heaton for providing the D14C realisations. We would also like to thank two anonymous reviewers whose constructive comments improved the manuscript, and to thank the editors Takeshi Sagiya and Yuhji Yamamoto for editing and handling the manuscript.
Funding
Open access funding provided by Lund University. This work was supported by the Swedish Research Council (Grant DNR20138421 to R. Muscheler and Grant DNR202004813 to A. Nilsson).
Author information
Authors and Affiliations
Contributions
All the authors have made substantial contribution to conception and design of the work. LN constructed the model used in the work and the analysis of the data. All authors interpreted the results of the analysis. LN has drafted the work and the authors NS, AN and RM have revised the work. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Additional file 1.
Supplementary document.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Nguyen, L., Suttie, N., Nilsson, A. et al. A novel Bayesian approach for disentangling solar and geomagnetic field influences on the radionuclide production rates. Earth Planets Space 74, 130 (2022). https://doi.org/10.1186/s40623022016881
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s40623022016881
Keywords
 Solar activity
 Paleomagnetism
 Cosmogenic radionuclide
 Holocene
 ^{14}C