## Abstract

Harnessing energy produced by thermonuclear fusion reactions has the potential to provide a clean and inexpensive source of energy to Earth. However, throughout the past seven decades, physicists learned that creating our very own fusion energy source is very difficult to achieve. We constructed a component-based, multiscale fusion workflow to model fusion plasma inside the core of a tokamak device. To ensure the simulation results agree with experimental values, the model needs to undergo the process of verification, validation and uncertainty quantification (VVUQ). This paper will go over the VVUQ work carried out in the multiscale fusion workflow (MFW), with the help of the EasyVVUQ software library developed by the VECMA project. In particular, similarity of distributions from simulation and experiment is explored as a validation metric. Such initial validation measures provide insights into the accuracy of the simulation results.

This article is part of the theme issue ‘Reliability and reproducibility in computational science: implementing verification, validation and uncertainty quantification *in silico*’.

### 1. Introduction

Thermonuclear fusion is a process in which lighter, positively charged nuclei fuse together to produce heavier nuclei. Energy is released during this process, and this is exactly how the Sun and other stars power themselves. Scientists have long desired to bring this power source to Earth as there are two major advantages: fuel availability and process cleanliness. For the former, heavy hydrogen isotopes, such as deuterium and tritium, required for fusion can be extracted from seawater (deuterium) and bred within the fusion process (tritium). As for the latter, fusion does not produce greenhouse gases, nor does it leave behind dangerous, long-lasting radioactive waste that we see from nuclear fission plants.

In order for the nuclei to fuse, a gas of charged particles (known as a plasma) needs to be heated to about a hundred million degrees. The Sun is massive, therefore can contain the plasma with its force of gravity. On the contrary, scientists have come up with various means to contain the hot plasma. One such device called the tokamak has been developed to contain the plasma using a magnetic field.

However, the plasma dynamics is very complex: micro-instabilities develop in the plasma that drive turbulence and can affect the overall transport and ultimately destroy confinement. This is a multiscale problem, in which turbulence spans millimetres in space and microseconds in time, while plasma transport happens at metres in space and seconds in time. Such scale disparity requires numerical models that can bridge those two to fully understand the dynamics of fusion plasmas.

In the plasma physics community, there are various efforts to construct multiscale simulation models for tokamak plasmas [1–9]. Namely, the European Integrated Tokamak [10] Modelling Task Force (EFDA ITM-TF) has put together a software infrastructure framework for building complex workflows and tokamak modelling [11]. It uses a standardized interface and takes the component-based approach, which leads to simple substitution of any of the components with another, as long as they share identical interface. Another approach, called the multiscale modelling and simulation framework (MMSF) [12–14], aims at guiding computational scientists from various domains through many steps related to the creation of a multiscale coupled application, from its conceptual design to its implementation and finally its execution on HPC. Using the ITM-TF standardized interface along with the MMSF framework, an MFW is pieced together [15,16]. The MFW has been used to tackle the time scale disparity between turbulence and transport in tokamak-like scenario. However, there is a need to ensure the simulation results are reliable compared to experiments.

Hence, uncertainty quantification (UQ) and validation processes are crucial in order to demonstrate the robustness of the simulation model. Code results can be ‘validated’ by comparison with experiment in a number of ways, ranging from qualitative (subjective) measures to quantitative measures, e.g. apply a validation metric. In [17], the authors discuss in some detail (64 pages) ‘Verification and validation in computational fluid dynamics’. A later article involving the same first author, [18] on ‘Measures of agreement between computation and experiment: validation metrics’, provides ‘Recommended features of validation metrics’ and makes the argument for providing quantitative measures of agreement. The application to fusion has been made in a number of papers, including [19] ‘Validation in fusion research: towards guidelines and best practices’, [20] ‘Verification and validation for magnetic fusion’ and [21] ‘Validation metrics for turbulent plasma transport’. The goal of this paper is to lay out the progress we have made thus far towards a validated multiscale fusion simulation model. The rest of the paper is structured as follows: we begin by giving a description of the multiscale fusion model. Next, we present a few production runs using MFW, with details on the time-bridging method we used to couple the turbulence and transport models, and details on methods that determine the quasi-steady state of the plasma. Then, we show the UQ methods we use on the workflow; this includes the technology employed and analysis of the outcome. Afterwards, an introduction to the validation methods along with initial validation on the simulation results are presented, including how experimental data are processed and how distribution similarity measurements are used to qualitatively compare distributions obtained from simulation with the experiment’s. Finally, we conclude with a summary and outlook.

### 2. Multiscale fusion workflow

The MFW takes the component-based approach: it couples single-scale models together and allows information to pass from one to another [15,16]. The topology of the workflow is illustrated in figure 1: a transport code (ETS [22]) calculates the plasma profiles, such as densities and temperatures. This information gets distributed to the equilibrium code (CHEASE [23]), which updates the plasma equilibrium geometry, and from there to the turbulence code (analytical or stochastic model), which calculates the fluxes. The stochastic model we use is GEM [24], and it is much more expensive compared to the analytical model. Then, a module (IMP4DV [25]) converts fluxes obtained from the turbulence code into transport coefficients required by the transport code.

Initial data are imported into the system via another component which allows users to select from various data sources such as files and databases. The information passed from one model to another is in the form of data structures that were agreed upon by the ITM community [10]. These data structures are called consistent physical objects (CPOs) [26].

### 3. Time-bridging and quasi-steady state

We take this multiscale workflow and explore different issues such as time-scale bridging between turbulence and transport codes and searching for a way to define a quasi-steady state of plasma [15]. To bridge the temporally disparate models, we make the time step size in the transport code adaptive, with value at each iteration dependent on two parameters Δ*T*_{e,lim} and $\mathrm{\Delta}{\mathrm{\partial}}_{\rho}{T}_{e,\text{lim}}$:

*T*_{e} is the electron temperature, ${\mathrm{\partial}}_{\rho}{T}_{e}$ is the derivative of *T*_{e} with respect to the toroidal flux coordinate *ρ* and *a* is the maximum toroidal flux coordinate value. The time-bridging method is implemented into MFW to simulate a scenario from the ASDEX Upgrade (AUG) tokamak.^{1} With Δ*T*_{e,lim} set to 0.1 and $\mathrm{\Delta}{\mathrm{\partial}}_{\rho}{T}_{e,\text{lim}}$ to 0.05, we obtained the electron and ion heat fluxes, *q*_{e} and *q*_{i}, respectively, and *T*_{e} and ion temperature *T*_{i} at each flux-tube, which are plotted in figures 2 and 3. Each flux-tube has a spatial position described by the normalized toroidal flux coordinates ||*ρ*_{tor}|| and colours of the traces represent the individual flux-tubes.

One way to find a quasi-steady state of the plasma is to calculate the average *T _{e}* (〈

*T*〉) for every

_{e}*N*time step [15]. If the previous

*G*groups of 〈

*T*

_{e}〉 (〈

*T*〉

_{e}_{n−G}to 〈

*T*〉

_{e}_{n−1}) are within the current 〈

*T*〉 (or〈

_{e}*T*

_{e}〉

_{n}) ± standard deviation (σ

_{n}), then we have found a quasi-steady state. Let us call this method 1.

Another way to locate quasi-steady state (method 2) is to take 1 s-worth of data at a time (labelled with *k* superscript) and find the maximum and minimum *T*_{e}(*T*_{e,max} and *T*_{e,min}, respectively). We then average those values: ${T}_{e}^{k}=({T}_{e,\text{min}}^{k}+{T}_{e,\text{max}}^{k})/2$.

This value is then compared to every second from the previous *M* seconds worth of data, e.g. ${T}_{e}^{k-M}$ to ${T}_{e}^{k-1}$. Specifically, for every second, we calculate the percentage difference

*m*≤

*M*. If it is equal to or below some threshold limit

*L*and holds true for every flux-tube, then we set this as a quasi-steady state. We can then take the 1 s worth of

*T*

_{e}data and calculate the mean and standard deviation. Taking the production run from figures 2 and 3, we apply both quasi-steady state methods 1 and 2. For method 1, we locate a quasi-steady state by setting

*N*= 200 steps/bin and

*G*= 5 previous steps. For method 2, we locate a quasi-steady state by setting

*M*= 2 s and

*L*= 10%, or

*M*= 2 s and

*L*= 5%. 〈

*T*

_{e}〉 and 〈

*T*

_{i}〉 of each flux-tube at quasi-steady state obtained from methods 1 and 2, along with their standard deviations, are plotted against ||

*ρ*

_{tor}|| in figure 4. Each

*T*

_{e}profile lies within one standard deviation of another

*T*

_{e}distribution, and the same applies to

*T*

_{i}profiles. This indicates the quasi-steady state temperature profiles obtained from both methods yield very similar results. Therefore, we will take the temperature profiles from method 1 and compare them with experimental data in §6(b).

As shown in figures 2 and 3, while the adaptive time step is effective for *T*_{e}, the time step sizes are too large for the ions: *T*_{i} exhibits large fluctuations. Both of the time-bridging and locating a quasi-steady state techniques described above only focused on constraining the time evolution of Δ*T*_{e} and $\mathrm{\Delta}{\mathrm{\partial}}_{\rho}{T}_{e}$. Luckily, these techniques can easily be expanded to include ions as well. For example, the adaptive time steps can include two additional parameters Δ*T*_{i,lim} and $\mathrm{\Delta}{\mathrm{\partial}}_{\rho}{T}_{i,\text{lim}}$ that are analogous to equations (3.1) and (3.2), respectively. This would adjust the time step size at each step such that *T*_{e}, *T*_{i}, ,${\mathrm{\partial}}_{\rho}{T}_{e}$ and ${\mathrm{\partial}}_{\rho}{T}_{i}$ calculated by the transport model do not change so much that the turbulence code has difficulty adjusting.

We simulate the same tokamak scenario above with different sets of time-bridging parameters Δ*T*_{e,lim} = Δ*T*_{i,lim} = 0.1 and $\mathrm{\Delta}{\mathrm{\partial}}_{\rho}{T}_{e,\text{lim}}=\mathrm{\Delta}{\mathrm{\partial}}_{\rho}{T}_{i,\text{lim}}=0.025$. *q*_{e}, *q*_{i}, *T*_{e} and *T*_{i} from this production run are shown in figures 5 and 6. With smaller $\mathrm{\Delta}{\mathrm{\partial}}_{\rho}{T}_{e,\text{lim}}$ value and additional time-bridging parameters on ions, the time step sizes are obviously smaller. Also, the fluctuations observed in *T*_{e} and *T*_{i} are much weaker than the previous simulation.

Next, we use methods 1 and 2 to find quasi-steady states for this production run. Method 1, which looks into the 〈*T*_{e}〉 for every *N* time steps, is now expanded to include ions as well, i.e. 〈*T*_{i}〉. A quasi-steady state for the electrons was found when *N* = 500 steps/bin and *G* = 5 previous steps. However, one *σ*_{n} is really small due to the small fluctuation on *T*_{e} over time (0.001 < *σ*_{n}/〈*T*_{e}〉 < 0.01), therefore we expand the acceptance interval to 〈*T*_{e}〉_{n} ± 3*σ*_{n} when we compare the previous *G* groups of 〈*T*_{e}〉 values. We did the same for ions, and the ions reach a quasi-steady state earlier than the electrons. However, we have decided to study both temperature profiles obtained at the same time, when the electrons reach quasi-steady state.

As for method 2, since we have a total of about 1 s worth of data from the 2nd production run, we take 0.1 s worth of data at a time to find ${T}_{\text{max}}^{k}$ and ${T}_{\text{min}}^{k}$ and not the 1 s we used earlier to study the 1st production run. In addition, we set *M* = 0.2 s and *L* = 5%. These numbers apply to both ions and electrons to obtain a quasi-steady state. The *T*_{e} and *T*_{i} profiles obtained from both methods are shown in figure 7.

However, are these simulation results consistent with experiments? We want to take a scientific, systematic approach to demonstrate whether we can trust these results or not, e.g. validate against experimental data, and quantify uncertainties that exist in the inputs and outputs. By doing VVUQ, we can put ourselves at a better position in determining whether or not the simulations match with the experiments.

### 4. Uncertainty quantification

UQ is the process of characterizing, analysing, and ultimately reducing the uncertainties in a model. Under the VECMA project, various UQ patterns for multiscale models are defined [27]. Two of these patterns are non-intrusive and semi-intrusive. The non-intrusive pattern treats the entire multiscale model as a black-box, and nothing is altered inside this box. We take into consideration the properties of the uncertain input parameters, and we employ UQ to study the statistical moments of quantity of interest (QoI). Another pattern is semi-intrusive: instead of treating the entire workflow as a blackbox, we study the inputs and outputs of each single-scale model. This section emphasizes the non-intrusive pattern. To carry out the UQ process, we rely on the EasyVVUQ library [28] from the VECMA toolkit [29].

EasyVVUQ is a Python library that aims to simplify the implementation of UQ into a workflow. The users need to provide parameter descriptions, e.g. specify model parameters, their probability distribution functions, and the range of physically acceptable values for sampling and select a UQ method. Then EasyVVUQ library translates the user-provided information into the sampler to generate a variety of simulation scenarios. The outputs of these runs are then collated for analysis. The analysis provides statistical moments of the QoI, such as expected value, standard deviation, variance and sensitivity indices.

#### (a) Polynomial chaos expansion

The polynomial chaos (PC) expansion method approximates a model *Y* using polynomial expansion. The polynomials are orthogonal to the probability density function of the inputs

*Q*,

*P*

_{n},

*c*

_{n}and

*N*

_{p}are the uncertain parameters, polynomials, expansion coefficients and number of expansion factors, respectively [30]. The advantage of PC is that it requires fewer samples than the quasi-Monte Carlo (qMC) or Monte Carlo (MC) methods. However, such an advantage only holds true when the number of uncertain parameters is below 20 [31–33]. In PC, when using tensored quadrature, the number of samples is given by

*Ns*

_{PC}=

*N*1

^{d}, where

*N*1 is the number of nodes in the one-dimensional quadrature rule and

*d*is the number of uncertain parameters. Usually

*N*1 =

*p*+ 1, where

*p*is the polynomial degree, here we choose

*p*= 3. In the qMC case, let

*Ns*

_{MC}be the number of samples used for model evaluations using MC to some accuracy (in our model we use

*Ns*

_{MC}= 10

^{4}). Therefore, to compute the Sobol indices, we used Saltelli’s sampling scheme and the qMC number of samples chosen is:

*NS*

_{qMC}= (

*d*+ 2) ×

*Ns*

_{MC}. It is important to note that the PC expansion can only be estimated with a finite number of terms; there is always truncation error in the estimates of uncertainty. In particular, the variance, standard deviation and sensitivity indices tend to be underestimated since, in their computation, the summation of the squares of the coefficients is involved.

#### (b) Sensitivity analysis

Sensitivity analysis (SA) quantifies how much of the variance in a model output is contributed by individual uncertain parameters. In particular, we select the Sobol SA that expresses the sensitivity in terms of indices: first-order Sobol sensitivity index quantifies the direct effect of individual input parameter on output variance, while higher order sensitivity index quantifies the effect on output variance due to the interaction between inputs [34]. A low sensitivity index means alterations to a particular uncertain parameter resulting in comparatively small variations in the final model output. Similarly, a high sensitivity index means any changes to a particular uncertain parameter can lead to a strong response from the model output.

### 5. Validation: similarity between code and experiment

The validation part of VVUQ has to do with comparing simulation results and determining whether such an outcome is in quantitative agreement with the experimental data. We take the distribution obtained from simulation and compare it with the distribution from the experiment. It should be kept in mind that the results of both modelling and experiment are given by probability distributions which hold in most cases only approximately: especially in the experiment the use of a normal distribution commonly only indicates some knowledge about the mean and the variance of the result but does not imply that the uncertainty distribution is indeed following a Gaussian (ie. that there are no outliers). Nevertheless, for comparison a metric is needed which allows a quantitative assessment and at the same time matches expectations of the users and (in the best case) is in accordance with physics intuition. The EasyVVUQ library can take the distributions and compare them using a number of metrics, including Hellinger distance [35], Jensen–Shannon distance (a symmetrized and smoothed version of the Kullback–Leibler (KL) divergence [36]) and Wasserstein metrics [37]. Thus, the computed distance allows us to get an idea of the similarities between simulation and experiment distributions. The first two metrics yield answers between 0 (zero distance: identical distributions) and 1 (very different). Wasserstein instead is unrestricted with a lower limit of zero.

Although these metrics allow in principle a quantitative ranking, it turned out (cf. figure 10) that already ‘small’ differences in shape (which are considered well within the uncertainties) typically yielded the result ‘very different’, i.e. the standard measures are too sensitive given that the involved PDFs themselves are approximations. For that reason, we suggest to use a measure which puts its emphasis on the lower moments (which are commonly the moments most reliably inferred in experiments) and less on the higher moments. The balance between the differences in the lower moments and the higher moments is given by an adjustable weighting factor *w*, which ranges between 0 and 1.

Loosely following [38] where a refined bound for the KL-divergence has been suggested, we thus suggest to use a measure for the compatibility of probability distributions which focuses on the first three statistical moments, i.e.

In this equation, *w*_{tot} represents the compatibility distance. *μ*, *σ* and *γ* are the mean, standard deviation and skewness of the dataset. The symmetric distance between the two distributions is given by the first term, and the second term incorporates the asymmetry of the distributions but is of negligible importance if the means do not coincide. *w*_{tot} ranges from value 0 to 1, with 0 representing identical distribution and 1 as completely different distribution. The difference between this metric and the ones mentioned earlier is that this metric takes into consideration the higher statistical moment, namely *γ*, and removes some deficiencies by taking asymmetry into account. A study based on comparisons of a variety of different distributions (split-Gaussian, Gaussians of different means and variances) resulted in a relatively small value of *w* = 0.05 to be in the best agreement with intuition. This will be demonstrated in the next section.

If one chooses to completely neglect the higher moments and heads for the simple assumption that both distributions are normal distributions a suitable standard measure is given by the *Z*-test [39]. Here, to determine how close the mean value from one distribution is, e.g. *T*_{e} obtained in simulation, to the other distribution, e.g. *T*_{e} obtained from experiment, with respect to the provided standard errors. It takes the mean (*μ*) and standard error ($\sigma /\sqrt{n}$, with *n* representing the number of data points) to determine the *z*-score (*Z*)

*Z*ranges from 0 (for identical distributions) to infinity. A comparison of the properties of the different measures is provided in the following section.

### 6. Uncertainty quantification and validation results

#### (a) UQ example

Uncertainties in the fusion workflow come from the applied heating sources and boundary conditions (epistemic) and from the stochastic behaviour of the turbulence (aleatoric). We only include the epistemic uncertainties: boundary conditions are defined by *T*_{e} and *T*_{i} at the plasma edge (||*ρ*_{tor}|| = 1), and characteristics of the applied heating sources (e.g. amplitude, width and position for both ion and electron heating power distributions). It is easiest to understand the notion of heating source through the heat transport equation or the continuity equation of energy flow

*u*,

*q*and

*S*represent the energy density, energy flux and heat source, respectively. The heating source is a Gaussian function characterized by its position (

*b*), amplitude (

*A*) and width (

*c*)

We selected the normal distribution to describe each of the uncertain input, with +/ −20% around the original value as physically acceptable limits. Using the PC method with six sample points along a single uncertain dimension, and four uncertain inputs *b*, *A*, *c* from electron heating source and *T*_{e} at plasma edge, we end up with 1296 samples. However, if all eight uncertain parameters are considered (four from electrons and four from ions), then 6^{8}, or about 1.7 million samples are generated. The number of PC samples grows exponentially with the number of uncertain parameters, and processing all of these samples can become cumbersome. Therefore, qMC [40] or PC with a sparse grid [41] become better options in generating samples.

Once the uncertain input parameters are defined, the MFW runs within a black-box for multiple time iterations until the plasma reaches steady state. The QoI from this simulation are the output *T*_{e} and *T*_{i} coming from the macroscopic transport model. EasyVVUQ library carries out the UQ analysis and provides the mean, standard deviation and variance to the QoI.

Based on the work by Lakhlili *et al.* [42], SA shows the variance in *T*_{e} is mainly due to uncertainty from the position of heating source at the core (||*ρ*_{tor}|| = 0.0), boundary condition stated at the edge, and the amplitude of the heating source in between the core and edge of the plasma. On the contrary, width does not have a direct effect on the variance of *T*_{e}. This tells us that we can eliminate the width of the heating source function as one of the uncertain parameters in the future and that would not affect the variance of the QoI. SA shows variance in *T*_{i} is mainly due to the uncertainty from position, amplitude of heating source at the core region and boundary condition at the edge. The width of the heating source function does not have a direct effect on the variance of *T*_{i}. This again tells us that we can reduce the number of uncertain parameters next time by eliminating the width of the electron heating source function and the temperature distributions would remain the same in terms of variance behaviour. The first-order sensitivity indices of *T*_{e} and *T*_{i} are illustrated in figure 8.

#### (b) Comparison with experiment

Now that we have *T*_{e} and *T*_{i} profiles from the production simulations (see §3.) and simulations equipped with UQ (see §a), we want to see how they compare to reality. To validate the temperature profiles obtained at quasi-steady state, measurements coming from 2 AUG experimental shots are taken into consideration. The profile data are provided by the integrated data analysis (IDA) scheme [43] at ASDEX Upgrade—essentially the profile estimation is based on the joint and simultaneous evaluation (Bayesian data fusion [44]) of a set of different diagnostics (Thomson scattering, lithium beam diagnostic, magnetic measurements), taking into account the respective uncertainties. The standard evaluation approximates the high-dimensional asymmetric and spatially correlated posterior distribution by the mean and one- or two-sided standard deviations of the profile quantities (e.g. *T*_{e}, *n*_{e}) either at a set of discrete radial locations or by the uncertainty of fitted profile functions. The reliability of the assigned uncertainties can in part be assessed by comparison with subsequent measurements using improved diagnostics with a higher accuracy [45]: the estimated overall uncertainty level is correct, but smaller deviations do exist. The ion temperature data and uncertainties are provided by measurements of the ASDEX Upgrade CXRS-spectroscopy diagnostic [46].

The electron (a) and ion (b) temperature profiles obtained from experiment, non-intrusive UQ of the workflow, and production simulations are plotted together in figure 9. The experiment data came from AUG experiment shot no. 36266 and no. 36297. The two sets of UQ results presented here have different uncertain inputs: one set has uncertain inputs from the applied electron heating source and boundary condition for *T*_{e}, while the second set has uncertain inputs from the applied ion heating source and boundary condition for *T*_{i}. One production simulation has adaptive time step parameters Δ*T*_{e,lim} = 0.1 and $\mathrm{\Delta}{\mathrm{\partial}}_{\rho}{T}_{e,\text{lim}}=0.05$, while the other simulation has Δ*T*_{e,lim} = Δ*T*_{i,lim} = 0.1 and $\mathrm{\Delta}{\mathrm{\partial}}_{\rho}{T}_{e,\text{lim}}=\mathrm{\Delta}{\mathrm{\partial}}_{\rho}{T}_{i,\text{lim}}=0.025$. We will label them as simulations *A* and *B*, respectively. The quasi-steady state profiles here are obtained from method 1 of the two methods described in §3.

In the *T*_{e} profiles, though employing a simpler analogue turbulence model, the UQ simulations provide an estimate on the uncertainties coming from heating source and boundary condition. The production runs use a stochastic turbulence model GEM that includes more physics descriptive to the turbulent behaviour of the plasma, as mentioned in §2. Although the production runs yield better agreement to the experiment, *T*_{e} at the inner two flux-tubes from simulation *A* are closer to the experiments than the *T*_{e} coming from simulation *B*. Similar observations are seen in the ion temperature profile (right). To give a quantitative answer, we turn to the similarity metrics mentioned in §5.

The quasi-steady state profiles from production runs, as shown in figure 9, have eight spatial points, representing each flux-tube. Each profile from UQ results has 100 ||*ρ*_{tor}|| values. The *T*_{e} measured from experiment shots no. 36266 and no. 36297 have 161 and 160 ||*ρ*_{tor}|| values, respectively. As for *T*_{i} measurements, both experiment shots have 96 ||*ρ*_{tor}|| values. Before we can use any of the distribution metrics, the number of ||*ρ*_{tor}|| values must match between simulation and experiment. Therefore, we take the simulated quasi-steady state profile obtained at a certain time bin and look into the full profile obtained in the transport model, which has a temperature span over 100 spatial grids. From there, we use interpolation to make sure the number of grids in simulation matches with the experiment measurements.

Three distribution similarity metrics are tested on the production simulation and experiment distributions: Hellinger distance, Jensen–Shannon distance and Wasserstein metric. The *T*_{e} and *T*_{i} validation results are shown in figure 10. The top panel of each figure shows the temperature profiles from simulation *A* and experiment shot no. 36266, and the bottom panel shows the distances measured using the similarity metrics. The probability distributions for simulation and experimental data are estimated as Gaussian and split-Gaussian, respectively.

Based on the top left panel of figure 10, *T*_{e} obtained from simulation *A* is deemed comparable to the experiment, especially at the region between $||{\rho}_{\text{tor}}||=0.14$ (where the inner-most flux-tube is located) and the edge, if we only focus on the overlaps between two distributions (with mean ± standard deviation considered). However, when we use the distribution similarity metrics to qualitatively compare, the results show that they are not as similar as they seem. For example, even though *T*_{e} at ||*ρ*_{tor}|| = 0.23 of both distributions have the same mean, the distances measured are not near 0. At ||*ρ*_{tor}|| = 0, average *T*_{i} from simulation and from experiment have percentage difference of less than 1%, and both distributions overlap. However, the similarity distances are not near 0 either. This is because these similarity metrics are sensitive to the variance of both distributions. Similar issues are also observed in the validation exercise for simulation *B* as well, in both *T*_{e} and *T*_{i} similarity measures. Overall, these three similarity metrics do not reflect very well on what we see from the simulation and experiment distributions. This is why we decide to turn to the compatibility measure and *Z*-test.

The compatibility measure and *Z*-test between one of the two production runs (simulations *A* and *B*) and experiment shot no. 36266 are shown in figures 11 and 12. Similar to figure 10, the top panel of each figure shows the temperature profiles we want to compare. The compatibility measure is applied three times, each with a different *w*: 0.05, 0.5, 0.95. The value of *w* decides the weight placed on the asymmetry characteristic of the distribution in the compatibility measure. For example, with *w* = 0.05, 5% of the emphasis is applied to the asymmetry of the distributions.

First, we study the compatibility between *T*_{e} from the simulations and from experiment shot no. 36266, as shown in figure 11. The compatibility distance with *w* = 0.05 at each flux-tube has the best agreement with intuition among the three weighting factors we have tested. In the lower left panel, the compatibility distance between simulation *A* and experiment is below 0.5 at every flux-tube except for the one at ||*ρ*_{tor}|| = 0.561, which has *w*_{tot} = 0.61. For simulation *B*, the flux-tube at ||*ρ*_{tor}|| = 0.668 has the highest compatibility distance of 0.66. The z-scores have a similar trend as the compatibility distance with *w* = 0.05.

Next, we study the compatibility between *T*_{i} from the simulations and from experiment shot no. 36266. Again a compatibility measure with *w* = 0.05 provides results that are most agreeable with intuition. As shown in the bottom left panel of figure 12, the distance between simulation *A* and experiment is always below 0.53. On the other hand, the distance between simulation *B* and experiment, except for the outermost flux-tube, is relatively high, with values ranging from 0.56 to 0.91 (bottom right panel). The *z*-scores also present a similar trend to that observed in compatibility measure with *w* = 0.05.

Overall, comparison between the probability distributions of temperature from simulations and experiment, e.g. AUG shot no. 36266, shows that simulation *B* tends to have a higher compatibility distance and *z*-score than simulation *A*. This is mainly due to the magnitude of temperature fluctuations; with smaller fluctuations observed in simulation *B* than in simulation *A*. The standard deviation of the probability distribution is also lower in simulation *B*. According to equations (5.1), lower standard deviation leads to higher compatibility distance. The distributions from both production runs were compared to the experiment data from shot no. 36297 as well, and we draw the same conclusion regarding the compatibility distances as we did to shot no. 36266—compatibility distance calculated between simulation *B* and experiment tends to be greater. This validation exercise highlights the importance of the shape of distribution functions. With that said, the distribution function we used for both simulation setups are a Gaussian with the mean and standard deviation calculated based on *N* temperature values at quasi-steady state. However, no epistemic and aleatoric uncertainties were taken into account in these production runs. We gained intuition on the effect of epistemic uncertainties, i.e. heating source and boundary condition, to the temperature distribution functions when UQ was applied to the MFW (with an analogue turbulence model) as a black-box, as we saw in figure 9. To provide a more comprehensive validation of production simulation against experiment, we will take epistemic and aleatoric uncertainties into consideration in the future.

### 7. Conclusion

In summary, we conducted a validation study between simulation and experiment data taken from the ASDEX Upgrade tokamak by employing quantitative measures between distribution distances. The simulation result comes from the MFW that aims to simulate both the effects of turbulence and transport by coupling multiple single-scale models together. The issue of time-scale mismatch arises between the turbulence and transport model. In the previous work, this issue was tackled by implementing an adaptive time step for every transport model iteration. In this work, we expanded the adaptive time step parameters by including two parameters that put a maximum value to the change in ion temperature Δ*T*_{i,lim} and the change in ion temperature gradient $\mathrm{\Delta}{\mathrm{\partial}}_{\rho}{T}_{i,\text{lim}}$, besides the already-exiting limits for electron temperature and electron temperature gradient. In addition, two methods that we used to determine quasi-steady state of a plasma are also presented. One was previously developed and the other is newly developed. These two methods yield profiles that have the mean temperature situated well within one standard deviation from the mean temperature of the other distribution. However, to make sure the simulation is reliable for modelling experiments, UQ and validation are applied to the workflow. To assist us with this task, the VECMA toolkit was used to perform UQ and SA on the entire workflow as a black-box. In this paper, we are interested in comparing results from (i) the production runs with adaptive time steps and quasi-steady state method implemented, (ii) the simulation equipped with UQ and (iii) experiments. While the temperature profiles look visually similar, especially in the region with ||*ρ*_{tor}|| ranges from 0.144 (where the inner-most flux-tube is located) to 1.0, the quantitative similarity measurements of distributions show us otherwise. In particular, we used measures such as Hellinger distance, Jensen-Shannon divergence and Wasserstein metric to quantitatively compare the results from two production runs and two different shots of the AUG tokamak experiment. This initial validation work shows that even though the mean temperature from simulation is rested within one standard deviation of the experiment measurement, the similarity distance can still equate to a value close to 1 (two very different distributions) due to the metrics’ sensitivity to variances from both distributions. This led us to other similarity measures that are consistent with our intuition. In particular, the compatibility measure includes a weighting factor *w* that determines how much emphasis is placed on the asymmetry of a distribution. Another measure is the *Z*-test, which treats both distributions as Gaussian. We learned that compatibility measure with a low *w* has the best agreement with intuition, and it shows a similar trend to the *z*-score. The comparison between simulations and experiment shows that simulation with adaptive time step parameters Δ*T*_{e,lim} = Δ*T*_{i,lim} = 0.1 and $\mathrm{\Delta}{\mathrm{\partial}}_{\rho}{T}_{e,\text{lim}}=\mathrm{\Delta}{\mathrm{\partial}}_{\rho}{T}_{i,\text{lim}}=0.025$ tends to have a higher compatibility distance and z-score than simulation with Δ*T*_{e,lim} = 0.1 and $\mathrm{\Delta}{\mathrm{\partial}}_{\rho}{T}_{e,\text{lim}}=0.05$. This is because the magnitude of temperature fluctuations in the former simulation is smaller.

One of the paradoxical outcomes of this work has been a regression in the agreement between the code and the experiment as methods were found to reduce the uncertainty arising from the turbulence code—since the distance between the results is weighted by the uncertainties. However, we know there are other uncertainties in the problem (in the position and characteristics of the source, in the values at the boundaries, in the assumed density profile) and methods of incorporating these uncertainties into our validation metric remain to be developed.

In the future, we will continue exploring different methods in validating multiscale fusion simulations, including improvements to the adaptive time step method and implementation of UQ into the production simulation, by quantifying epistemic uncertainties to the workflow and aleatoric uncertainties to the stochastic turbulence model. Ultimately, the distribution similarity metrics studied in this article shall be compared to the validation metrics suggested by the articles [17–21].

### Data accessibility

To access the VECMA toolkit and tutorials, please visit https://www.vecma-toolkit.eu/.

### Authors' contributions

O.O.L. drafted the majority of the content in this paper and carried out the simulations discussed in this paper. J.L. created and provided his expertise to the programming scripts necessary for the uncertainty quantification and validation work. O.H. provided his expertise in multiscale modelling framework.UvT provided his expertise in uncertainty quantification and validation. B.D.S. provided his expertise on the turbulence model and physics-related problems. D.P.C. provided his expertise in the single-scale models within the fusion multiscale framework, uncertainty quantification and validation.

### Competing interests

We declare we have no competing interests.

### Funding

This project has received funding from the European Union’s Horizon 2020research and innovation programme for the VECMA project, under grant agreement no. 800925. This project is part of the FET-Future Emerging Technologies funding schema.

## Acknowledgements

The authors thank all the collaborators from the VECMA project. The authors also want to thank EUROfusion and the EUROfusion High Performance Computer (Marconi-Fusion) for the computing allocation, which made part of the results presented in this paper possible.

## Footnotes

### References

- 1.
LoDestro L, Cohen B, Cohen R . 1991 Comparison of simulations and theory of low-frequency plasma turbulence. In*Plasma physics and controlled nuclear fusion research 1990*, vol. 2. Google Scholar - 2.
Shestakov A, Cohen R, Crotinger J, LoDestro L, Tarditi A, Xu X . 2003 Self-consistent modeling of turbulence and transport.**J. Comput. Phys.**, 399–426. (doi:10.1016/S0021-9991(02)00063-3) Crossref, Web of Science, Google Scholar**185** - 3.
Nishimura Y, Coster D, Scott B . 2004 Characterization of electrostatic turbulent fluxes in tokamak edge plasmas.**Phys. Plasmas**, 115–124. (doi:10.1063/1.1631811) Crossref, Web of Science, Google Scholar**11** - 4.
Reiser D, Scott B . 2005 Electromagnetic fluid drift turbulence in static ergodic magnetic fields.**Phys. Plasmas**, 122308. (doi:10.1063/1.2141928) Crossref, Web of Science, Google Scholar**12** - 5.
Highcock E, Mandell N, Barnes M, Dorland W . 2018 Optimisation of confinement in a fusion reactor using a nonlinear turbulence model.**J. Plasma Phys.**, 905840208. (doi:10.1017/S002237781800034X) Crossref, Web of Science, Google Scholar**84** - 6.
Parker JB, LoDestro LL, Told D, Merlo G, Ricketson LF, Campos A, Jenko F, Hittinger JA . 2018 Bringing global gyrokinetic turbulence simulations to the transport timescale using a multiscale approach.**Nucl. Fusion**, 054004. (doi:10.1088/1741-4326/aab5c8) Crossref, Web of Science, Google Scholar**58** - 7.
Meneghini O *et al.*2015 Integrated modeling applications for tokamak experiments with OMFIT.**Nucl. Fusion**, 083008. (doi:10.1088/0029-5515/55/8/083008) Crossref, Web of Science, Google Scholar**55** - 8.
Barnes M, Abel I, Dorland W, Görler T, Hammett G, Jenko F . 2010 Direct multiscale coupling of a transport code to gyrokinetic turbulence codes a.**Phys. Plasmas**, 056109. (doi:10.1063/1.3323082) Crossref, Web of Science, Google Scholar**17** - 9.
Candy J, Holland C, Waltz R, Fahey MR, Belli E . 2009 Tokamak profile prediction using direct gyrokinetic and neoclassical simulation.**Phys. Plasmas**, 060704. (doi:10.1063/1.3167820) Crossref, Web of Science, Google Scholar**16** - 10.
Becoulet A, Strand P, Wilson H, Romanelli M, Eriksson L-G . 2007 The way towards thermonuclear fusion simulators.**Comput. Phys. Commun.**, 55–59. (doi:10.1016/j.cpc.2007.02.051) Crossref, Web of Science, Google Scholar**177** - 11.
Falchetto GL . 2014 The european integrated tokamak modelling (ITM) effort: achievements and first physics results.**Nucl. Fusion**, 043018. (doi:10.1088/0029-5515/54/4/043018) Crossref, Web of Science, Google Scholar**54** - 12.
Chopard B, Borgdorff J, Hoekstra AG . 2014 A framework for multi-scale modelling.**Phil. Trans. R. Soc. A**, 20130378. (doi:10.1098/rsta.2013.0378) Link, Web of Science, Google Scholar**372** - 13.
Borgdorff J, Falcone J-L, Lorenz E, Bona-Casas C, Chopard B, Hoekstra AG . 2013 Foundations of distributed multiscale computing: formalization, specification, and analysis.**J. Parallel Distrib. Comput.**, 465–483. (doi:10.1016/j.jpdc.2012.12.011) Crossref, Web of Science, Google Scholar**73** - 14.
Chopard B, Falcone J-L, Kunzli P, Veen L, Hoekstra A . 2018 Multiscale modeling: recent progress and open questions, multiscale and multidisciplinary modeling.**Exp. Des.**, 57–68. (doi:10.1007/s41939-017-0006-4) Google Scholar**1** - 15.
Luk OO, Hoenen O, Bottino A, Scott BD, Coster D . 2019 Compat framework for multiscale simulations applied to fusion plasmas.**Comput. Phys. Commun.**, 126–133. (doi:10.1016/j.cpc.2018.12.021) Crossref, Web of Science, Google Scholar**239** - 16.
Luk OO *et al.*2019 Application of the extreme scaling computing pattern on multiscale fusion plasma modelling.**Phil. Trans. R. Soc. A**, 20180152. (doi:10.1098/rsta.2018.0152) Link, Web of Science, Google Scholar**377** - 17.
Oberkampf WL, Trucano TG . 2002 Verification and validation in computational fluid dynamics.**Prog. Aerosp. Sci.**, 209–272. (doi:10.1016/S0376-0421(02)00005-2) Crossref, Web of Science, Google Scholar**38** - 18.
Oberkampf WL, Barone MF . 2006 Measures of agreement between computation and experiment: validation metrics.**J. Comput. Phys.**, 5–36. (doi:10.1016/j.jcp.2006.03.037) Crossref, Web of Science, Google Scholar**217** - 19.
Terry PW, Greenwald M, Leboeuf J-N, McKee GR, Mikkelsen DR, Nevins WM, Newman DE, Stotler DP , Task group on verification and validation, U.S. Burning Plasma Organization, U.S. Transport Task Force. 2008 Validation in fusion research: towards guidelines and best practices.**Phys. Plasmas**, 062503. (doi:10.1063/1.2928909) Crossref, Web of Science, Google Scholar**15** - 20.
Greenwald M . 2010 Verification and validation for magnetic fusion.**Phys. Plasmas**, 058101. (doi:10.1063/1.3298884) Crossref, Web of Science, Google Scholar**17** - 21.
Holland C . 2016 Validation metrics for turbulent plasma transport.**Phys. Plasmas**, 060901. (doi:10.1063/1.4954151) Crossref, Web of Science, Google Scholar**23** - 22.
Coster DP *et al.*2010 The European transport solver.**IEEE Trans. Plasma Sci.**, 2085–2092. (doi:10.1109/TPS.2010.2056707) Crossref, Web of Science, Google Scholar**38** - 23.
Lütjens H, Bondeson A, Sauter O . 1996 The CHEASE code for toroidal MHD equilibria.**Comput. Phys. Commun.**, 219–260. (doi:10.1016/0010-4655(96)00046-X) Crossref, Web of Science, Google Scholar**97** - 24.
Scott BD . 2005 Free-energy conservation in local gyrofluid models.**Phys. Plasmas**, 102307. (doi:10.1063/1.2064968) Crossref, Web of Science, Google Scholar**12** - 25.
Scott BD . 2005 Dynamical alignment in three species tokamak edge turbulence.**Phys. Plasmas**, 082305. (doi:10.1063/1.1993507) Crossref, Web of Science, Google Scholar**12** - 26.
Imbeaux F *et al.*2010 A generic data structure for integrated modelling of tokamak physics and subsystems.**Comput. Phys. Commun.**, 987–998. (doi:10.1016/j.cpc.2010.02.001) Crossref, Web of Science, Google Scholar**181** - 27.
Ye D, Veen L, Nikishova A, Lakhlili J, Edeling W, Luk OO, Krzhizhanovskaya VV, Hoekstra AG . 2021 Uncertainty quantification patterns for multiscale models.**Phil. Trans. R. Soc. A**, 20200072. (doi:10.1098/rsta.2020.0072) Link, Web of Science, Google Scholar**379** - 28.
Richardson RA, Wright DW, Edeling W, Jancauskas V, Lakhlili J, Coveney PV . 2020 Easyvvuq, a library for verification, validation and uncertainty quantification in high performance computing.**J. Open Res. Softw.**, 11. (doi:10.5334/jors.303) Crossref, Google Scholar**8** - 29.
Groen D *et al.*2019 Introducing VECMAtk-verification, validation and uncertainty quantification for multiscale and hpc simulations. In*International Conference on Computational Science*, pp. 479–492. Springer. Google Scholar - 30.
Preuss R, von Toussaint U . 2016 Investigations on bayesian uncertainty quantification with two examples. In*AIP Conf. Proc.*, vol. 1757, p. 060001. AIP Publishing LLC. Google Scholar - 31.
Xiu D, Hesthaven JS . 2005 High-order collocation methods for differential equations with random inputs.**SIAM J. Sci. Comput.**, 1118–1139. (doi:10.1137/040615201) Crossref, Web of Science, Google Scholar**27** - 32.
Crestaux T, Le Maıtre O, Martinez J-M . 2009 Polynomial chaos expansion for sensitivity analysis.**Reliab. Eng. Syst. Saf.**, 1161–1172. (doi:10.1016/j.ress.2008.10.008) Crossref, Web of Science, Google Scholar**94** - 33.
Eck VG, Donders WP, Sturdy J, Feinberg J, Delhaas T, Hellevik LR, Huberts W . 2016 A guide to uncertainty quantification and sensitivity analysis for cardiovascular applications.**Int. J. Numer. Methods Biomed. Eng.**, e02755. (doi:10.1002/cnm.2755) Crossref, Web of Science, Google Scholar**32** - 34.
Saltelli A, Annoni P, Azzini I, Campolongo F, Ratto M, Tarantola S . 2010 Variance based sensitivity analysis of model output. Design and estimator for the total sensitivity index.**Comput. Phys. Commun.**, 259–270. (doi:10.1016/j.cpc.2009.09.018) Crossref, Web of Science, Google Scholar**181** - 35.
Nikulin MS . Hellinger distance. Encyclopedia of mathematics 78. Google Scholar - 36.
Kullback S, Leibler RA . 1951 On information and sufficiency.**Ann. Math. Stat.**, 79–86. (doi:10.1214/aoms/1177729694) Crossref, Google Scholar**22** - 37.
Villani C . 2016**Optimal transport: old and new, Grundlehren der mathematischen Wissenschaften**. Berlin, Germany: Springer. Google Scholar - 38.
Nishiyama T . 2019 A new lower bound for kullback-leibler divergence based on hammersley-chapman-robbins bound. (http://arxiv.org/abs/1907.00288) Google Scholar - 39.
Montgomery DC, Runger GC . 2010**Applied statistics and probability for engineers**. Hoboken, NJ: John Wiley & Sons. Google Scholar - 40.
Sobol IM . 1998 On quasi-monte carlo integrations.**Math. Comput. Simul.**, 103–112. (doi:10.1016/S0378-4754(98)00096-2) Crossref, Web of Science, Google Scholar**47** - 41.
Smolyak SA . 1963 Quadrature and interpolation formulas for tensor products of certain classes of functions. In*Multivariate stochastic volatility estimation with sparse grid integration*(ed. Doklady Akademii Nauk), vol. 148, pp. 1042–1045. Russian Academy of Sciences. Google Scholar - 42.
Lakhlili J, Hoenen O, Luk OO, Coster DP . 2020 Uncertainty quantification for multiscale fusion plasma simulations with VECMA toolkit. In*Computational Science – ICCS 2020*, pp. 719–730. Springer International Publishing. Google Scholar - 43.
Fischer R, Fuchs C, Kurzan B, Suttrop W, Wolfrum E, Team AU . 2010 Integrated data analysis of profile diagnostics at ASDEX upgrade.**Fusion Sci. Technol.**, 675–684. (doi:10.13182/FST10-110) Crossref, Web of Science, Google Scholar**58** - 44.
Von der Linden W, Dose V, Von Toussaint U . 2014**Bayesian probability theory: applications in the physical sciences**. Cambridge, UK: Cambridge University Press. Crossref, Google Scholar - 45.
Willensdorfer M *et al.*2014 Characterization of the Li-BES at asdex upgrade.**Plasma Phys. Control. Fusion**, 025008. (doi:10.1088/0741-3335/56/2/025008) Crossref, Web of Science, Google Scholar**56** - 46.
McDermott R *et al.*2017 Extensions to the charge exchange recombination spectroscopy diagnostic suite at ASDEX upgrade.**Rev. Sci. Instrum.**, 073508. (doi:10.1063/1.4993131) Crossref, PubMed, Web of Science, Google Scholar**88**