Observation-based correction of dynamical models using thermostats.

Models used in simulation may give accurate short-term trajectories but distort long-term (statistical) properties. In this work, we augment a given approximate model with a control law (a 'thermostat') that gently perturbs the dynamical system to target a thermodynamic state consistent with a set of prescribed (possibly evolving) observations. As proof of concept, we provide an example involving a point vortex fluid model on the sphere, for which we show convergence of equilibrium quantities (in the stationary case) and the ability of the thermostat to dynamically track a transient state.


Introduction
In applications of modern computational science, the underpinning physical laws (and equations of motion) are often well established, yet the detailed behaviour is unpredictable on long time scales due to the presence of deterministic chaos. Examples of this arise in molecular dynamics [1,2], in polymer simulation [3] and in the modelling of turbulent fluids in the atmosphere and ocean [4,5], where long simulations are used to extract statistical information (e.g. the statistics of rare transitions between basins in molecular dynamics, or slow relaxation processes in fluids).   . . In this type of dynamical sampling, the common requirement is that simulated paths are sufficiently accurate to calculate measures of dynamical mixing such as 2-point temporal correlation functions [6].
In Hamiltonian systems such as molecular dynamics, it is common to run ensembles of microcanonical (i.e. constant energy) simulations. For such simulations, backward error analysis [7,8] suggests that a long simulation should not be viewed as the approximation of a particular trajectory of the system but rather as an exact solution of a perturbed continuum process described by modified equations. In the case of dynamical sampling of complex systems, the statistics of simulation data are biased because they sample an invariant measure of the modified equations (rather than those of the target system), i.e. the time-discretization error induces an effective statistical bias.
Statistical bias may also arise due to spatial discretization. For example, in the setting of geophysical fluid dynamics, a comparison of discretizations of the quasi-geostrophic equations reveals that the long-time mean potential vorticity field and pointwise fluctuation statistics are heavily dependent on discrete conservation laws such as energy, enstrophy and material conservation of vorticity [9][10][11]. The discretization bias may be controlled by reducing timestep or by incorporating a Metropolis condition [12,13], possibly increasing the computational cost. The use of a Metropolis condition can also increase the difficulty of computing accurate dynamical properties. The need for computations to be accurate with respect to the dynamical process as well as to the long-term statistics thus poses difficult challenges to the simulator.
In this paper, we consider a method of perturbing the dynamics of the system to correct for statistical bias. Our method is based on the concept of a thermostat, by which we mean a scheme based on a control law designed to facilitate sampling of a target probability density function (pdf). Although originally proposed in molecular dynamics (see, e.g. the Nosé-Hoover method [14,15]), thermostats can be extended to handle a wide variety of systems and have been used in a wide variety of applications. For example, in Dubinkina et al. [16], a thermostat was used as a model reduction technique for a vortex model of a fluid (suppressing the detailed interactions of a few strong vortices with a weak vortex field), an approach we draw on in the numerical experiments of this article. In another recent article [17], thermostats have been suggested as a means of sampling incompletely specified systems (with noisy gradients), with applications in machine learning for Bayesian inference in the 'big data' context.
In the general modelling scenario, thermostats are significantly hampered by the need to specify a functional form of the pdf a priori. In this article, we assume that, instead of the pdf, what is actually available is a partial set of expectations of observables with respect to the unknown probability measure, which may arise from experiment or other types of modelling. In this setting, information theory (in particular entropy maximization [18,19]) offers tools for constructing densities consistent with observations that are close to a given prior distribution (see [20][21][22] for examples in geophysical modelling, [23,24] for discussion in molecular modelling). Our algorithm for computing the parameters finds Lagrange multipliers (one for each observable) that modify the probability density, using iterative techniques like those in Agmon et al. [25], Haken [26], Davis & Gutiérrez [27], where each step of iteration involves ensemble simulation that reweights samples from the prior distribution appropriately. The density resulting from such an entropy maximization is then used as the basis for designing a thermostat that ensures the extended dynamical system samples the distribution and consequently exhibits the correct (long-term) statistical averages, while mildly perturbing the short-term dynamical behaviour of the system. The complexity and generality of our iterative procedure (sampling→parametrization→sampling. . .) means that convergence has to be verified through computer experiment.
Entropy maximization also forms the basis for closure approximations in the works of Ilg et al. [3] and Samaey et al. [28], where the result from entropy maximization is used in determining evolution equations for resolved macroscopic quantities. This approach is distinct in philosophy from the current work in which the result of entropy maximization is used to define a statistical ensemble and evolution equations for microscopic dynamics that are consistent with macroscopic observations.
While we motivate the method in the setting of equilibrium sampling of a fixed target distribution, we can easily apply our scheme to the case of a more general model with unknown or even undefined steady state. For this purpose, an ensemble of simulations are run in finite duration bursts, approximating the Lagrange multipliers via an iterative procedure based on the simulation history. This procedure allows entropy maximization to be performed with ensemble averages that evolve in time, providing a flexible method of nonlinear data assimilation.
The remainder of this article is organized as follows. In §2, we review the idea of a thermostat which is the key tool we employ to control the invariant distribution. Section 3 describes the combination of thermostats with the maximum entropy framework for correcting the density to reflect thermodynamic constraints. Section 4 describes the adaptive procedure. In §5, we apply and evaluate the method in the setting of a system of point vortices on the surface of a sphere, which represents a simple geophysical model with multiple statistically relevant first integrals.

Thermostats
We introduce the thermostat technique for generic Hamiltonian dynamical systems of the form possessing a divergence-free vector field 1 ∇ · f ≡ 0. Invariance of the Hamiltonian H along solutions of (2.1) follows from (d/dt)H(y(t)) = ∇H · dy/dt = ∇H · B∇H = 0, due to skew-symmetry of B(y). Additional first integrals may exist, denoted by I (y) : Thermostats were introduced in molecular dynamics to model the trajectories of molecules in a fluid at constant temperature. In the common statistical mechanical framework, it is assumed that the trajectories of a system of particles in thermal equilibrium with a reservoir at constant temperature sample the canonical distribution, with smooth invariant density proportional to exp(−βH) where β is a parameter reciprocal to temperature scaled by Boltzmann's constant. The trajectories of the Hamiltonian system (2.1) are restricted to a level set of energy. Hence, to model a system at constant temperature, it is necessary to perturb the vector field to make trajectories ergodic with respect to the canonical distribution. The most common way of achieving this is by adding stochastic and dissipative terms satisfying a fluctuation-dissipation relation (Langevin dynamics), which is rigorously ergodic [29].
However, DelSole [30] points out that additive stochastic forcing of trajectories leads to inaccurate dynamical quantities as autocorrelation functions are strongly perturbed. For smooth deterministic Hamiltonian dynamics, normalized velocity autocorrelation functions are of the form 1 − cτ 2 , c > 0 in the zero-lag limit τ → 0, whereas the autocorrelation of a variable that is directly forced by white noise must take the form exp(−κτ ), κ > 0 in the same limit. This implies that direct stochastic perturbation leads to autocorrelation functions that have non-zero slope and opposite curvature at zero lag [31].
An alternative approach, due to Nosé [14,15] and Hoover [32], augments the phase space with an additional thermostat variable ξ (t) (driven by a differential equation) in order to ensure that the extended dynamics on R d × R preserves an equilibrium density whose marginal on R d is the target (e.g. Gibbs) density. The deterministic thermostats of Nosé and Hoover are not ergodic, but they can be combined with stochastic forcing of the thermostat variable ξ , leading to the so-called Nosé-Hoover-Langevin method [33]. For generic divergence-free dynamical systems, the generalized Bulgac-Kusnezov (GBK) [34,35] thermostat provides the augmented system: dy = f (y) dt + ξεg(y) dt (2.2a) and dξ = εh(y) dt − γ ξ dt + 2γ dw, (2.2b) where ε > 0 and γ > 0 are parameters, w(t) is a scalar Wiener process, and g and h are vector fields which we discuss in more detail below. Given a target density ρ(y) ∝ exp(−A(y)), A : D → R, denote the augmented product density byρ(y, ξ ) = ρ(y) · μ(ξ ), with μ a univariate normal distribution with mean zero and standard deviation one. It is easily checked thatρ is stationary under the Fokker-Planck operator associated with (2.2) provided and provided the 'potential' A is only a function of y through the first integrals of f . The (extended) target measureρ is ergodic if the vector fields f and g satisfy a Hörmander condition [36]. Up to verification of the Hörmander condition, the choice of g is free. In the example of §5, we consider a specific form for g.
The parameter ε controls the strength of the thermostat relative to that of the vector field f . This affects the rate at which the invariant measure is sampled. It has been demonstrated in Bajars et al. [36,37] and Leimkuhler et al. [31] that GBK/NHL thermostating gives a weak perturbation of microcanonical dynamics: typical autocorrelation functions match the exact ones to leading order, i.e. have the form 1 − cτ 2 Designing a thermostat relies on knowing the functional form of the target distribution, which may not always be available. In this article, we relax this requirement by constructing the target distribution iteratively and adaptively, based on Jaynes' principle of least-biased ensemble prediction.

Bias correction method
Suppose that we are given a simplified dynamical model for the evolution of some projection, i.e. a 'coarse graining' of the phase variables (coarse-grained variables y(t) ∈ R d ) which is in the form (2.1). Although the original system is complex and its details unknown, we assume that we can obtain in some way (e.g. through measurement) a collection of observations of mean values of functions of the reduced variables. That is, there are functions C k : R d → R, k = 1, 2 . . . , K and given values c k , k = 1, 2, . . . , K, such that where C k (y) represents averaging with respect to the true, empirical invariant measure of the dynamical system. The use of a GBK thermostat (2.2) limits our implementation to observables that are functions of the conserved quantities {H, I , = 1, . . . , L}, that is, C k (y) = C k (H(y), I 1 (y), . . . , I L (y)), k = 1, . . . , K. This restriction is only due to the choice of thermostat, the entropy maximization has no such restriction. Our goal is to find a perturbed dynamical model for the reduced variables which (i) is compatible with the thermodynamic constraints (3.1), and (ii) mildly perturbs the dynamics compared to those of the native model.

(a) Entropy maximization
Empirical information theory generalizes the principle of insufficient reason, by proposing the least-biased probability density consistent with a set of observations. This idea was first proposed by Jaynes [18,19] and is treated in the monographs of Haken [26], Dewar et al. [38] and in the context of geophysical fluids by Majda & Wang [39]. The least-biased density is defined as the probability density ρ(y) that maximizes the information entropy functional S[ρ] = − D ρ(y) log ρ(y) dy, subject to a set of constraints given by observations. When D is a compact set and there are no observations, the minimizer is the uniform density ρ ≡ |D| −1 . The entropy S is the unique measure of uncertainty that is positive valued, monotonically increasing as a function of uncertainty, and additive for independent random variables. With observable denote expectation in the (as yet undetermined) density ρ. Defining Lagrange multipliers λ k , k = 1, . . . , K, associated with the observables C k , we obtain the solution ρ from a stationarity condition for the Euler-Lagrange functional Conditions for a unique solution to exist are discussed in Balian [40]. When it exists, the maximum entropy solution satisfies where λ 0 is chosen to satisfy D ρ dy = 1, and λ k is chosen such that E ρ C k (y) = c k . In some cases, besides the observations, we may be given prior statistical information on the process y(t). The Kullback-Leibler divergence, or relative entropy, represents a (non-symmetric) distance between measures. Note the change of sign, which makes the relative entropy more like a distance measure; we stick to this convention. It quantifies the information lost in approximating ρ(y) by π (y). Suppose y is a random variable with distribution (law) y ∼ ρ, where ρ is unknown. Suppose further, that we are given a prior distribution π , presumed to be close to ρ, and a set of K observations (3.1). Following Jaynes [18,19], the least-biased distribution ρ consistent with the observations c k and prior π solves the constrained minimization problem where the λ k are Lagrange multipliers to enforce the condition that the expectations (3.2) agree with the observations (3.1). The solution to the variational problem is where the Lagrange multipliers λ k are chosen consistently with the observations (3.1) and λ 0 is a normalization constant so that ρ is a probability density function. Calculation of the Lagrange multipliers is discussed in [25][26][27]. We use the following algorithm based on re-weighting. Assume we are given a sequence of samples y n , n = 1, . . . , N, distributed according to a known prior distribution π (y), i.e. y n ∼ π . The expectation under π (y) of a function Φ(y) has the consistent and unbiased estimator Given the posterior distribution ρ(y) of the form (3.3), compute the expectation E ρ Φ by reweighting of the integral yielding an unbiased estimator for E ρ Φ given bŷ We wish to ensure that the estimatorsĈ ρ k for observables C k match their empirical values c k , i.e.

Adaptive determination of Lagrange multipliers
When the observations deviate far from the prior, the Lagrange multipliers take larger values and the weights of the samples diverge, i.e. some weights become very large and others become very small. This results in a larger variance on the estimates and consequently in poorer results for the Lagrange multipliers. Moreover, the approach of the previous section excludes applications where (i) the statistical knowledge is expected to improve as the simulation progresses, (ii) the average observables are known to vary slowly with time or (iii) it is unfeasible to construct a large enough ensemble distributed in the prior. For these cases we consider using the simulation data of a small ensemble (propagated in short bursts of M timesteps) for updating the Lagrange multipliers for mean observation data. This results in an adaptive algorithm for obtaining the Lagrange multipliers on-the-fly during simulation. In this situation, the sample set used is drawn from a distribution closer to the target, resulting in better estimations for the residuals and Jacobian.
Let us assume that we have a (moderately sized) ensemble of initial conditions according to some distribution In our numerical experiments, we assume λ 0 k = 0, ∀ k and hence ρ 0 = π , but we present the general form here nevertheless. Rather than first computing the Lagrange multipliers corresponding to observations and then running the simulation over the time interval of interest, we perform a short burst of the thermostatted simulation for P ensemble members, using the Lagrange multipliers λ 0 k in the thermostat. The system states at the end of this burst, denoted by y p (M t) for ensemble member p at time M t, are used for an estimator of the observables in the distribution ρ 0 : where the superscript (1) indicates that it is an estimator for a distribution with Lagrange multipliers λ 1 k . Note that the samples being distributed in ρ 0 and not in π leads to the appearance of the Lagrange multipliers λ 0 k , this is a simple importance sampling procedure. A Newton-Raphson procedure similar to that in §3 is then used to find λ 1 k using the estimator (4.2). The resulting multipliers are used in the second burst of thermostatted simulations, sampling ρ 1 (y) = π (y) exp(− K k=1 λ 1 k C k (y)). Now, crucially, this estimator from the ensemble after two simulation bursts is combined with that at the end of the first burst. As both estimators are unbiased, we may use any weighted sum of the two (with the sum of the weights equal to one) to find another estimator. For simplicity, we weight both estimators equally. We write the combination of estimators for the general case after L bursts of the simulation, using the system states at the end of each previous burst, that is y p ( M t) for = 1, 2, . . . , L, as an estimator for the densities ρ −1 with the Lagrange multipliers λ −1 k . This estimator takes the form As with the non-adaptive scheme, a Newton-Raphson procedure is used to find Lagrange multipliers such thatĈ 4) and the gradients By performing this iteration after each burst, and using the result for the subsequent burst, the Lagrange multipliers are computed 'on-the-fly' without the need for constructing a large ensemble in the prior. Let us consider two limiting cases to justify the approach. (i) By increasing the number of ensemble members P, one recovers the non-adaptive scheme, where the Lagrange multipliers are found based on the initial ensemble after a single burst. On the other hand, (ii) if we have a modest ensemble P, it is necessary that the burst is long enough for the thermostatted dynamics with the adaptive parameters to equilibrate on this time scale. The error due to the transient of the thermostat is removed in the limit of many timesteps per burst, i.e. as M → ∞. We present numerical verification of the practical use of the scheme in a specific application setting in §6c.

(a) Adaptive algorithm
There are two important modifications to the algorithm described above that are included in the numerical implementation of this method:  (4.5), all previous values λ k are included. In a long simulation, this leads to a growing computational demand. By taking only a fixed number (q) of recent steps the cost is reduced. In the case that the initial samples cannot accurately be drawn from the prior, this has the further advantage that these inaccuracies are eventually forgotten.
The resulting method is summarized in algorithm 1.
Given initial conditions according to prior π (y) Set initial Lagrange multipliers to zero for m ← 1 to n do for j ← 1 to M do advance simulation one timestep using current value for the Lagrange multipliers end store relevant simulation observables for timestep im. . . , mM compute residual gradient using the same data update Lagrange multiplier estimation end limit the change in Lagrange muliplier (if necessary) end Algorithm 1: Adaptive determination of Lagrange multipliers 'on-the-fly'

Application to reduced modelling of point vortices
Here we apply the least-biased correction method to a system of point vortices on the sphere, which has a Poisson structure and multiple conserved quantities, including total energy and angular momentum and a set of Casimirs.

(a) Point vortex system
Point vortices represent singular solutions to two-dimensional, incompressible fluid flow in which each vortex, positive or negative, is centred at a specified position. Point vortices have been studied extensively in [41][42][43]. We work on a spherical domain [44][45][46][47][48], viewing the vortex position as a vector in R 3 with unit norm. The equations of motion have a Lie-Poisson structure where the Hamiltonian is given by and each Γ i represents the circulation of a single point vortex. By introducing y = (x T 1 , x T 2 , . . . , x T M ) T , equation (5.1) has the form (2.1) with the block-diagonal structure matrix wherex i is the 3 × 3 skew-matrix satisfyingx i a := x i × a, for all a ∈ R 3 . The Poisson bracket for the system is given equivalently by This Poisson structure generalizes the rigid body Poisson structure and also occurs in ferromagnetic spin lattices [49][50][51] and elastic rods (e.g. [52]).
The vortex positions are defined in Cartesian coordinates, but initial positions x i (0) are chosen on the sphere. Because each |x i | is a Casimir of the Poisson bracket, the vortices remain on the sphere. Furthermore, the rotational symmetry of the sphere gives rise to three Noether momenta, which are expressed by the angular momentum vector The GBK thermostat (2.2) is only applicable to divergence free systems ∇ · f ≡ 0. It is straightforward to check that this condition holds for the spherical point vortex model.
A numerical integrator for the point vortex system is constructed as in Myerscough & Frank [48] based on splitting the differential equations into integrable subproblems (see related ideas in [53,54]). The backward error analysis of symplectic integrators has an analogous development for Poisson systems, implying approximate conservation of the Hamiltonian [8]. The Casimirs (|x i | ≡ 1) and the momentum J = Γ i x i are exactly preserved by the integrator.

(b) Thermostat perturbation vector
The choice of the perturbation vector field g in (2.2) is flexible, but the vector fields f and g should satisfy a Hörmander condition [36]. It is vital that the Casimirs should be respected by the perturbation vector field. Double-bracket dissipation [55] preserves Casimirs of the original system and is thus an ideal candidate for g: This 'double-bracket thermostat' also has the advantage that it can be split into pairwise interactions along with the original dynamics f , simplifying computation. The denominator in (5.3) leads to a stiff differential equation when like-signed vortices approach one another, restricting the step size of an explicit splitting method. For this reason, we use a slightly modified scheme (but still Casimir-preserving) defined by Details of the numerical integration of these dynamics may be found in appendix A. The thermostat (2.2) is designed to sample a target density ρ(y) ∝ e −A(y) on the phase space of y. The thermostat variable ξ is normally distributed, yielding the extended distribution ρ ∝ e −A(y)− 1 2 ξ 2 . The perturbation vector field g must additionally ensure that the thermostatted system is ergodic in the target density. Because the target measure is positive for all open sets on the phase space, hypoellipticity of the Fokker-Planck equation associated with (2.2) is sufficient to prove uniqueness of the invariant measure [56]. Hypoellipticity follows from Hörmander's controllability condition [57]. The condition has been tailored to GBK thermostats in Bajars et al. [36], but it is difficult to verify in practice. Here we verify using simulation that single trajectories have statistics that agree with the target distribution.

(c) Maximum entropy model
To apply the methods proposed in §3 in the setting of a reduced model for point vortices, we use point vortices distributed evenly over the surface of the sphere as the prior π . The use of a thermostat for the bias correction limits us to using observables that are function of the first integrals. Furthermore, we assume the angular momentum has no directional preference, a choice we motivate in the numerical comparisons of §6. In that case, only the magnitude |J| of the momentum vector needs to be considered.
To accurately reproduce statistics from a full model with a moderate number of point vortices, it is necessary to modify the canonical density with a term quadratic in the Hamiltonian, that is, a density of the form ρ(y) ∝ exp(−βH(y) − γ H(y) 2 ) [16]. Motivated by Dubinkina et al. [16], we choose observations that include linear and quadratic functions of H and |J| 2 .
We consider the following set of observables: and denote the corresponding Lagrange multipliers by β H , β J , γ H , γ J and γ HJ . The least-biased density consistent with observations of the EC k is given bỹ where the Lagrange multipliers are to be found via the procedure detailed in §3a or that of §4.

Numerical comparison
Here we apply algorithm 1 to a reduced model of point vortices similar to the configuration used in Bühler [58] and Dubinkina et al. [16]. We distinguish between three models. The full model consists of a system (5.1) of M full = 288 point vortices, of which eight strong vortices of circulation Γ j = ±1 and 280 weak vortices of circulation Γ j = ± 1 5 . Both strong and weak classes comprised equal numbers of positively and negatively oriented point vortices. The reduced model consists of (5.1) with just M = 8 strong vortices. Finally, the corrected model consists of a thermostatted system (2.2) with unperturbed vector field f given by (5.1) for M = 8 strong vortices, perturbation vector field g given by (5.4), and equilibrium measure defined by the least-biased density (5.6). Additionally, we compare with Metropolis-Hastings samples from the least-biased density (5.6) to help distinguish between errors incurred due to the maximum-entropy model and those due to sampling bias of the (discretized) thermostatted dynamics.
We run five long simulations of the full model with total energies chosen from the set H full ∈ {−2, −1, 0, 1, 2}. The total angular momentum vector is fixed at J full = 0 such that there is no directional preference for the momentum of the reduced model embedded in the full model. For each run, we determine the time averages of the observables (5.5) for the subset of strong vortices. When computing the Hamiltonian H of this subsystem, we include only the internal coupling between strong vortices. The time averages are tabulated in table 1. Using these averages, we compute the Lagrange multipliers using the algorithm described in §4 with prior distribution π taken to be uniform on the sphere. The Lagrange multipliers are also recorded in table 2. The magnitude of γ {H,J,HJ} indicates that including these observables impacts the resulting density. In other words, all observables add information to the least-biased density, with the possible exception of the lowest energy case with H full = −2.
We then run simulations of the corrected model using the computed parameters. Table 1 also records expectations from the thermostat-corrected model.          By analogy with canonical statistical mechanics, we may think of the weak vortices that are ignored in the reduced model as forming a reservoir with which our reduced model exchanges energy and angular momentum. Experience with canonical statistical mechanics of point vortices in the plane [16,58] suggests that for small reservoir sizes, the canonical distribution must be modified with higher order terms to agree with the full system statistics. Table 3 gives the multipliers for different numbers M full of vortices in the full system, confirming that those corresponding to γ H , γ J and γ HJ are more significant for smaller M full ; as the number of weak vortices grows, the strong vortex system approaches the canonical density.
Remark 6.1. The form of the distribution (5.6) implies there is a non-zero probability for any configuration of the strong vortex system with finite energy and momentum. When considering the strong vortex system in contact with a reservoir of weak vortices, its energy is bounded by the energy in the total system plus or minus how much energy can be stored in the reservoir and in the interactions between strong and weak vortices. For the momentum, a similar condition holds, but here there are no interaction terms. If it is not possible for the reservoir to add or subtract sufficient energy from the reservoir, it will not be possible for the subsystem of strong vortices to attain all possible configurations with non-zero probability.
For the energy, it is readily verified that as long as the reservoir contains three vortices not all of equal sign, the reservoir may supply or remove an arbitrary amount of energy, as close of   approaches of two like-signed (respectively, opposite-signed) vortices produces arbitrarily largepositive (respectively, negative) energies. For the momentum, the result is more interesting. The system of M strong vortices, all with strength ±Γ strong , may exist in configurations with an angular momentum sup |J red. | = MΓ strong . Note the supremum, as the configuration with exactly sup |J red. | = MΓ strong will have infinite energy. For the reservoir, it holds that sup |J weak | ≤ (N − M)Γ weak . For the the reservoir to supply sufficient angular momentum, it is necessary that In the thermal bath simulations discussed in this section M = 8 and Γ strong /Γ weak = 5, this means M full should satisfy M full ≥ 48. The smallest system considered (M full = 36) does not, explaining its eccentric parameter values in table 3.

(a) Equilibrium results
In this section, we compare statistical properties of the corrected model with those of the full and reduced models. In figures 1-3, we show histograms of a number of solution features for the eight vortex model: the distributions of H and |J|, as well as typical distances between likeand opposite-signed vortices, a metric also used by Bühler [58]. In each histogram, the statistics corresponding to the strong vortices in the full model, the reduced model, thermostat-corrected reduced model and Metropolis-Hastings samples are displayed. Figures 1-3     we use 10 6 samples. The same figures also show results from the thermostatted system (dasheddotted lines), run with a timestep of 10 −3 up to T = 10 6 , taking 10 6 samples. The parameters in (2.2) were set to be = 10 and γ = 0. 1. These results confirm that the thermostatted system samples the least-biased density closely. The complete simulation trajectories are available in the electronic supplementary material. The reduced model is Hamiltonian and the Poisson integrator ensures that the energy is conserved with a standard deviation of order 10 −3 and the angular momentum constant to machine precision. Both cases correspond to approximate delta-distributions in the upper histograms in figures 1-3. Note that due to the high skewness of the distribution for |J|, the observed mean differs significantly from the median and mode, implying some ambiguity in choosing the angular momentum for an appropriate initial condition for the reduced model.
A simple Hamiltonian-reduced model is incapable of sampling the energy and angular momentum spectra, as these are first integrals, hence the reduced model shows significant bias in statistics such as vortex separation. On the other hand, the thermostat-corrected model faithfully samples the least-biased probability density, as indicated by the good agreement in the histograms of the corrected model and Metropolis-Hastings samples. The least-biased density approximates the strong-vortex statistics well, particularly in the negative to moderate total energy regime. At large positive total energies, the strong vortex energy and angular momentum distributions are still well represented by the least-biased PDF, but some bias in the vortex separations can be observed. The closeness of the thermostat results to those from the Metropolis-Hastings sampling indicate the error lies in the choice of least-biased density, not in the thermostat sampling.

(b) Dynamic consistency
The results in the previous section confirm that the thermostatted simulations lead to equilibrium distributions of observables H and |J| similar to those of the full system. In this section, we address the issue of the degree to which our equilibrium correction mechanism disturbs dynamics, as encoded in autocorrelation functions and diffusivity. Diffusivity was considered by Chavanis [59] for a system of identical point vortices and by Cotter & Pavliotis [60] for a wide array of problems with scale separation. We emphasize that the values of the thermostat parameters ε and γ have no impact on the equilibrium statistics presented in the previous section, and only affect the rate at which the least-biased PDF is sampled. Parameters with a larger deviation from the unperturbed dynamics lead to faster equilibration of the distribution, giving rise to a modelling choice.

(i) Autocorrelation functions
Given a sequence of L equally spaced observation times t i ∈ [0, T] for i ∈ [0, L], and the values of the relevant observable (in our case vortex position) u i = u(t i ) at those times, the discrete autocorrelation function is defined by A normalized autocorrelation functionν u is given by dividing each ν u i by ν u 0 , i.e.ν u i = ν u i /ν u 0 . We average the autocorrelation function over all 3M (strong) vortex coordinates {x m , y m , z m } m=1. ..M . Three symmetries in the problem justify this averaging: the vortex numbering is arbitrary; the choice of reference frame is arbitrary and the sign of the vortices appears in the dynamics as a reversal of time, to which the autocorrelation is insensitive. Additionally, the observables H and |J| are isotropic.
Furthermore, we ensure that the phase space is well sampled by averaging the autocorrelation functions over an ensemble of P solutions. We then find the average autocorrelation function   where a superscript p represents the solution from ensemble member p. The normalized average autocorrelation function is given bŷ where we have used that the Casimirs C i = x i (t) · x i (t) = 1 ∀ i, t are conserved exactly, also in the discretized equations.
In figure 4, we compare autocorrelation functions for the strong vortices in the full and reduced models as well as for the thermostat-corrected model over a range of parameters ε and γ . The thick solid black line represents the result for an (unthermostatted) system in contact with 280 weak (Γ B = ± 1 5 ) vortices, with a total energy H full = 0. The results present the average over an ensemble of 1000 runs. For each simulation, the initial placement of each strong vortex was taken uniformly over the sphere and the weak vortices were placed such that the full system satisfied H full = 0 and J full = 0. The thick dashed black line represents the results for an ensemble of simulations of the isolated system, with everything else unchanged. The other lines represent results for thermostatted simulations using the Lagrange multipliers as given in table 2 for the case of H = 0. As expected, when the thermostat is weak the autocorrelation functions are similar to those of the isolated system. The stronger thermostat parameters presented here match the autocorrelation well for a short-time period, but all result in excessive decorrelation for longer lag times. In the following section on the diffusivity constant, we discuss the performance of different parameter values in more detail.

(ii) Diffusivity
For general multiscale dynamical systems with a separation of slow and fast dynamics, it is often desirable to model fast forces by a diffusion process, resulting in stochastic differential equation of the form [61,62] where f represents the slow dynamics, W is a Wiener process and K is the diffusivity. The value of the diffusivity K can be estimated by sampling solutions to the original, multiscale, problem and rspa.royalsocietypublishing.org Proc. R. Soc applying Kubo's formula where X represents displacement during the sampling interval τ . Choosing the correct sampling interval is a notorious problem; for a comparison, see [61].
If we take the average diffusivity for each vortex coordinate, we find We assume the observations are given at the same times t i as before and that the sampling time is an integer multiple of the observation interval, i.e. τ = k(T/L). With an ensemble of P simulations the diffusivity estimator would then be where again a superscript p denotes the solution from ensemble member p. Averaging over all time series data yields the estimator: with τ = k(T/L) = t k . This shift-averaged estimator incorporates all possible time intervals of length τ available from the data; it is shown by Cotter & Pavliotis [60] to improve the quality of the estimator. Figure 5 presents the diffusivity constant as a function of the interval time τ for different parameter values for ε and γ . The results are taken from the same simulations used for the autocorrelation functions, as described in the section.
For ε small, the thermostat perturbation is weak, and both autocorrelation functions and diffusivity approach those of the reduced model with constant H, J. Also, the autocorrelations are insensitive to the parameter γ in this regime. For values of ε < 1, the autocorrelations and diffusivities become indistinguishable from those of the isolated model. Hence even though the dynamics samples the least-biased density on long time scales, its short-time dynamics is similar to an unperturbed model. For moderate ε, dependence on γ becomes more pronounced, and a diffusivity closer to that of the full model can be achieved. For even larger values of ε, the diffusivity becomes much more sensitive to the value of γ , as indicated in figure 5c. Figure 5d has been included to illustrate two important properties by plotting the diffusion parameter against the sampling interval on a log-log scale. First, for large sampling interval the estimator shows an inverse linear tendency. This corresponds simply to the decorrelation of the vortex dynamics. Second, as the sampling interval goes to zero, the estimator of the diffusivity constant shows polynomial behaviour. This is in agreement with known results for the GBK thermostat [63] and is an improvement on Langevin thermostats, which tend to a constant value for short sampling intervals, in disagreement with a deterministic reference [30]. See also the discussion in §2.

(c) Adaptive determination of multipliers
In this section, we apply the scheme of §4 to the point vortex system. Consider the same reduced point vortex model of eight vortices with Γ = ±1. We perform a thermostatted simulation of this system, which we feed with observation data taken from a simulation of the 288-vortex system including the weak vortices that form the thermal bath. The observation from simulation with the thermal bath may be replaced by observation from another source-if available-such as empirical observations or a prediction of the trend for the observables. In particular, we use the observed data in three different ways for computing the Lagrange multipliers that determine the equilibrium distribution used by the thermostat: (i) The first approach mirrors the non-adaptive method of §3. We use the long time averages from equilibrated thermal bath simulation. The equilibrium distribution varies during the thermostatted simulation, but during the whole simulation it converges to the same equilibrium density as before (like those denoted in table 2). (ii) The second approach does not assume long time averages are available a priori. Instead, the running mean from the thermal bath simulation is used. This means the iterative procedure (4.4) drives the system towards a varying target during simulation. Eventually, the target approaches the long time average used above. (iii) In the final implementation, we relax the notion of statistical equilibrium for the strong system and consider the statistical state to be slowly varying. This is implemented by The change in the distance between the two vortices can be represented by the change of their inner product The solution to this differential equation is given by In the antisymmetric part, we introduce α A = (Γ i − Γ j )/4π to writė By rearranging the order of the cross product, we achievė The vector a A implicity defined by (A 4) and (A 5) can simply be shown to be constant under the antisymmetric flow. This means the dynamics of the antisymmetric part may be integrated by using Rodigues's formula.