Infinite product expansion of the Fokker–Planck equation with steady-state solution

We present an analytical technique for solving Fokker–Planck equations that have a steady-state solution by representing the solution as an infinite product rather than, as usual, an infinite sum. This method has many advantages: automatically ensuring positivity of the resulting approximation, and by design exactly matching both the short- and long-term behaviour. The efficacy of the technique is demonstrated via comparisons with computations of typical examples.


Introduction (a) Objectives
In this paper, we propose a novel analytical procedure for solving diffusion equations of the form where the steady-state solution is well defined, i.e. a normalizable probability density (we call this the 'stable' case): . example, if it governs the concentration of diffusing charged ions in an electrostatic potential well it is generally referred to as the Nernst-Planck equation [1]. If the potential is not quadratic then one has (1.1) with A nonlinear and the equation is analytically intractable; techniques are therefore required to deal with it so that one does not have to fall back on full numerical solutions of the PDE. More often, it arises as the forward (Fokker-Planck, FP) equation associated with the meanreverting diffusion process in dimensionless form. It is perhaps worth clarifying what we mean by 'nonlinear diffusion' as the term is ambiguous. A diffusion process such as (1.2) is said to be nonlinear if the drift term is nonlinear or the volatility term is non-constant; hence whenever A is nonlinear we have a nonlinear diffusion process. But the FP equation (1.1) that arises from it is always a linear diffusion equation (linear in f ). That does not mean, however, that it is easy to solve. What is unusual about this paper is that we will be modifying (1.1) through a nonlinear change of the dependent variable f so that the diffusion equation does become nonlinear, and our thesis is that this nonlinear PDE is actually easier to handle than its linear counterpart.
In general, (1.2) is an important model of physical phenomena such as electronic noise and kinetics [2], electronic circuits with nonlinear resistance [3], and systems with overdamped Langevin dynamics [4] or from nonlinear friction [5,6]. Another example of application has been a model of variability of chemical concentration in ice-core data as a proxy of climate variability [7], where it is necessary to model large excursions. If A(y) = −y, we are left with the familiar Ornstein-Uhlenbeck (OU) process [8], or Langevin equation [9], but the nonlinear examples have A(y) = −y and these require attention.
An analytical approximation to a PDE has considerable practical value aside from providing an immediate insight into the form of the PDE's solution as there is a considerable improvement in computation speed over the standard numerical grid-based methods. This is particularly true in the context of statistical signal processing techniques such as Markov chain Monte Carlo [10], where one wishes to sample from the distribution of Y t 2 given Y t 1 , for t 2 > t 1 ; knowing the density, one can sample using the rejection method [11]. Then an approximate solution that captures the short-and long-term behaviour is considerably more practical than having to regenerate the solution by numerically solving the PDE at each step. The time saved in that calculation can then be spent on running more Monte Carlo simulations. In fact, we suggest that even our leadingorder term (3.23) is sufficiently accurate for the vast majority of practical applications, whatever the discipline.
Although there are physics applications of (1.2), as we have said, in fact it was mathematical finance that provided the main impetus for this research. The process (1.2) is a model of mean reversion and in the case when A(y) = −y it is the OU process which in dimensional coordinates is But this has an important disadvantage, in that its steady-state distribution is Normal, so that large excursions are very unlikely in the model. Despite the fact that this has been known for years, the Normal distribution is still used in risk management in areas where it should not be. For example, investigation of JP Morgan Chase's 'London whale' trading losses, estimated at at least $5bn, shows that the Normal distribution function was used to transform a number of standard deviations into a loss probability even at high levels of confidence. 1 Before that, the demise of Long Term Capital Management (LTCM) in 1998 can be directly attributed to overleveraged 'convergence trades' [13]. In reality, the following two mechanisms occur when the market is far from equilibrium, and these cause such excursions to occur much more often than in the simple OU model.  The first mechanism is that the volatility is likely to be higher. This is seen in the so-called basis risks, in which the price difference between two closely related financial instruments should be zero: in times of market stress, when the distance from equilibrium is large, market liquidity is lower and the volatility higher, so large excursions become likely. One recent example is the socalled credit default swap (CDS) index basis, which is the difference between the index level of a CDS index contract and the level implied by its constituents, which should theoretically be nearly zero (see [14] for a general discussion on this). Another is the CDS-bond basis during the 2007-2009 financial crisis, where the credit spread of credit-risky cash bonds deviated violently from the level implied by the CDS market (e.g. [15, fig. 1], [16, fig. 1]). The effect is clear in figure 1a,b, with pronounced departure from the mean during the 2007-2009 financial crisis. A convenient formulation increasing the volatility away from equilibrium is Models such as this, in which the volatility depends deterministically on the spot price X t and/or time, are called local volatility models (as distinct from stochastic volatility models; e.g. [17]). In fact (1.4) is precisely the same recipe as suggested in the aforementioned climate change paper [7]. It is also encountered as one of the Pearson-Wong diffusions [18,19] but has received less attention than the more commonly encountered OU and square-root processes also in that class, because it is less analytically tractable. The second mechanism is that the reversion speed may decline. This is seen in examples where there is no strict requirement that prices must converge, and arises if the market volume directed towards a convergence bet declines, even if the total liquidity remains the same. An example is i.e. ν = 5. (Using the same excitation process W t each time; σ = 2, κ = 2, ν = 1 + 2κ/γ 2 σ 2 .) Note the visual similarity between (b) and figure 1a,b; also between (d) and figure 1c.
figure 1c, which shows the deviation between interest swap rates of three different tenors; this is the so-called 'butterfly' trade (e.g. [20]). We calculate the 5Y rate minus the weighted average, weights 0.3 and 0.7, respectively, of the 2Y and 10Y rates, after which the 10-year exponentially weighted moving average is subtracted so as to take out the long-term average; not the length of the excursion from 2009 (post-crisis) onwards. This second mechanism is captured by the following models: and also by (1.4) after transformation by γ X = sinhγ Y,γ = γ σ/ √ 2κ, giving The same effects can also easily be seen in simulation (figure 2a-d).
We have distinguished two mechanisms, but although the distinction is important for motivating an underlying model, it is unimportant to the issue of solving the forward equation. Indeed, as we just did with (1. require a trivial rescaling Y = X/ σ 2 /2κ; also τ = κt. In general, the transformation Y = η(X) with η(x) = dx/σ X (x) does this, provided σ X is non-zero. The canonical forms are therefore (where we have non-dimensionalized (1.5) and (1.6) to use Y = X/ σ 2 /2κ rather than X) (B and K ν denoting as usual the Beta function and the modified Bessel function of the second kind; see [21]) and throughoutγ = γ σ 2 2κ ; ν =γ −2 + 1 so that the non-dimensional parameterγ or ν measures the deviation from the OU model (γ = 0, ν = ∞).
In each case, the density is fatter tailed than Gaussian. In (1.6), the density is fatter tailed than in (1.5) because the reversion disappears completely as the diffusion process moves far from equilibrium, and so it is left to wander around aimlessly.
There is a reasonable amount of similarity, at least visually, between figures 2b and 1a,b. We therefore tested the model (1.4), using the standard likelihood-ratio test, with the null hypothesis H 0 :γ = 0 against the alternative H 1 :γ > 0. The method yields a maximum-likelihood estimate for the parameter in question and an estimate of the relative likelihood of H 0 versus H 1 given the data (the p-value). For figure 1a, we found ν = 3.8 with p-value 1 × 10 −8 , and for figure 1b we found ν = 3.2 with p-value 1 × 10 −32 . The rejection of the basic OU model is unsurprising, given the huge excursions from equilibrium in both datasets. Similarly, figure 2d bears a resemblance to figure 1c. Testing (1.6) in the same way on figure 1c, we found ν = 4.4 with p-value 5 × 10 −6 . This suggests that (1.6) is preferable to the simple OU model. (For the purposes of risk management, it is clearly prudent to use any of (1.5)-(1.7) in place of the basic OU, even if there is no firm statistical evidence.) We round off this section by returning to physics, with an example that is considerably further from the OU process, qualitatively, at least in the sense of not being in a one-parameter deformation of the OU model. This is the double-well potential, where the invariant density is bimodal. This has been studied by Caroli et al. [22], who used the Wentzel-Kramers-Brillouin (WKB) approximation to study the behaviour in different time regimes; here we shall demonstrate a method of solution valid on all time scales simultaneously. For the sake of concreteness, we confine ourselves to a particular example   where we require c 1 > 1 (so that there are two wells), c 2 = 0 (for asymptotic stability). Figure 3 shows typical realizations of the corresponding Ito process, for σ = 2, κ = 2, c 1 = 4, c 2 2 = 2, τ = κt. In some realizations such as figure 3a the process hops from one well to the other (these are centred at X = ± 3 2 ); but it can spend a long time in one well (figure 3b). In non-dimensional form, this is in general, f Y (∞, y) = Gaussian × polynomial, with the polynomial factorizable as a product of strictly positive quadratics, is a form that will construct multi-modal examples of arbitrary complexity.

(b) Infinite products
The main thrust of this paper is that one benefits significantly from studying the logarithmic derivative, in the spatial direction, of the solution. An immediate advantage is that upon integration and re-exponentiation a positive solution must be obtained (there is still a timedependent normalizing factor to obtain, but as we show later it is easy to ensure that this is positive). Next, the logarithmic derivative is an algebraically simpler construction. Indeed, for the OU model one has (In the case of an arithmetic Brownian motion with no reversion and starting from the origin, this would simply read x/σ 2 t.) We wish to replicate this exactly when A is linear, and this solution is a starting point in our analysis. Note that the form of (1.10) and the stochastic representation of the OU process as a scaled time-transformed Wiener process [23] X t = X 0 e −κt + ξ e −κt W e 2κt −1 suggest a substitution q = e −2κt , and indeed we use this as an ansatz for mean-reverting processes in general.
The next issue is that we want a technique in which short-and long-time behaviour can be specified explicitly, and that these be replicated. This is because the former must be a Gaussian distribution as t → 0, while the latter is the already-identified invariant density. Thus arrives the notion of an infinite product expansion, in the sense of writing the logarithmic derivative as the sum of a term that replicates the short-time behaviour, another that replicates the long-time behaviour, and a series that corrects the middle.
The infinite product is a significant departure from 'standard' techniques for solving parabolic PDEs; these have in common that they are all infinite sums. Three such are: spectral methods (Fourier transformation in the spatial coordinate [24]); orthogonal expansions (expand the spatial dependence as a time-weighted sum of eigenfunctions of the infinitesimal generator); and Laplace transformation in time. All suffer from the problem that it is difficult to represent the initial condition effectively. As a case in point, take the Mehler series expansion [25], in terms of Hermite polynomials He r (x), to the OU PDE: For κt 1, it is clear from the sum that the convergence is very slow, because the basis functions, which are oscillatory, are required as t → 0 to sum to give a delta function; truncation therefore generates oscillatory artefacts. Figure 4 shows the situation for t = 0; indeed, the truncated Mehler series (taking only terms 0 ≤ r < N) can be approximated around the spike x = X 0 by the sinc function sin( This behaviour is not specific to the OU process, and the slowness of convergence and the concomitant Gibbs phenomenon are a consequence of the Riemann-Lebesgue lemma. Yet for an infinite product, it is easy to make the short-time solution zero at all x = X 0 , and hence like a delta function. One simply arranges for one of the terms in the product to be zero for t = 0, x = X 0 , and then it does not matter what the rest of the terms are: so another can be responsible for modelling the long-term behaviour, and the others for the middle-time region-as we intimated above. Even if one does not bother to correct the 'middle-time' behaviour, the results are remarkably good: this is seen later in equation (3.23) and in §3e and accompanying figures. Probabilistically, infinite products have the attractive interpretation that each term in the product is the Radon-Nikodym derivative of the successive partial products. In statistical physics, the entropy is an important concept, and an infinite product represents this much more effectively than does a sum; from the point of view of an approximation, it is clearly disastrous to have a negative probability density anywhere. Note also that under a change of 'spatial coordinate' X, to Y = η(X) say, a multiplicative factor of |η (X)| enters, so multiplicative representations are preserved under change of variable (because the Jacobian is just another term in the product) in a way that additive ones are not. Accordingly, it is quite incorrect to view an infinite product as a minor variant on an infinite sum: rather, we argue, the infinite product is fundamentally different and also a clearer way of thinking about the problem. On the other hand, the introduction of a logarithm causes a quadratic nonlinearity to appear in the FP equation, making the algebra considerably more difficult. In fact, the series expansion of its solution generates a quadratic recurrence of self-convolutive type somewhat analogous to that discussed in [26].

Preliminaries
and working with g instead, we arrive at the adjoint FP equation. This is similar to (1.1): and is also known as the backward equation (a term that we do not use here, because the equation arises as a forward equation: τ represents calendar time). This development is useful because we are going to ensure that, when we approximate g Y (and hence g X also), its long-term asymptote is unity. By so doing, we ensure that the approximated f tends to the correct invariant density.
As motivated in the Introduction, we deal with the logarithmic derivative of g Y rather than with g Y itself. We therefore introduce a function h Y , so that h Y satisfies an equation analogous to Burgers': Conventionally, the Hopf-Cole transformation [27] is used to convert the (nonlinear) Burgers equation into the (linear) diffusion equation, which is regarded as being more easily analysed. We are doing it backwards here, which may seem perverse: but note that this is also how the WKB approximation works [28], so the logarithmic transformation to a nonlinear equation has a well-established precedent.
The apparent similarity to the WKB expansion is worth discussing briefly. The similarity in form of the recursion is similar, as the logarithmic-derivative transformation also gives rise to what is, in essence, a Riccati equation. However, WKB is most naturally applied to problems in which the coefficient of the second derivative-here, the volatility (noise) term-is small. It gives rise to a singular expansion, because for zero noise the order of the differential equation reduces from two to one. That is not what we do: our approach is to take the OU solution for h Y (τ , y) and add terms to correct it. Thus whereas WKB expands the solution around a deterministic system, our techniques expand it around a known stochastic one. If the noise is very small, WKB may be more effective, but we have not observed a general failure of our methods in that limit.

Infinite product expansion (a) Solution of nonlinear partial differential equation by series expansion
We now seek a series expansion as a solution of (2.2). In the regular OU case, A(y) = −θy, we have Now for general A, we still have a diffusion starting from Y 0 , in which the distribution's variance initially grows with time as 2τ , so if we expand around τ = 0, i.e. q = 1, in powers of (1 − q), we are led to consider a development where θ is an arbitrary constant, and will be set later. This can also be established by dominant balance in (2.2), as on the RHS the h 2 Y term predominates as τ → 0. The basic OU solution is corrected by a series expansion thus: with R N denoting the remainder. Note that N = 0, which we call the leading-order approximation, means that no (b r ) terms enter. Many of the manipulations in the first part of this section serve to exchange terms of the form 1, √ q, q, the point simply being that any two of these differ by o (1 − q) in the vicinity of q = 1. The form of (3.1) is very important. Any truncation of the series must vanish as τ → ∞, which is what we need, because all terms contain a factor of q or √ q. Note also that the truncated error is uniformly bounded in τ , because the expansion is in q(1 − q) r and q lies in the compact interval [0, 1]. Thus, for each N and each y the error in h Y (τ , y) is bounded in 0 ≤ τ ≤ ∞. Such uniformity is obviously not shared by, for example, a series expansion in powers of τ , which is why we dismissed that idea out of hand.
The functions (b r ), and the remainder, are dependent on Y 0 , and should therefore be thought of as b r (y | Y 0 ): for reasons of conciseness we just write b r (y) or just b r , and when we write b r it means that the y-dependence, not the Y 0 -dependence, is being differentiated.
We are shortly going to equate coefficients of powers of (1 − q), and for that reason the √ q term in (3.1) is unwelcome; we therefore exchange √ q for q plus a Taylor series around q = 1. Writing √ q = (1 − (1 − q)) 1/2 and using the binomial theorem, we find Note also that δ 0 = 1 2 and (r + 2)δ r+1 = (r + 3 2 )δ r , which is needed later. Thus, the revised series is with a modified remainder term (the term R N now contains all of R N , plus the r ≥ N part of the infinite sum in (3.2)). The next step is to compute the (b r ) by comparing coefficients in q(1 − q) r . Note that the recurrence we are about to derive for the (b r ) will be obtained from (3.3), but for computational purposes we use (3.1), of course omitting the remainder term. It is convenient to write b −1 (y) ≡ θ (y − Y 0 ) so that the first term can be absorbed into the summations, and also δ −1 = 0. We then substitute (3.3) into and equate terms in q(1 − q) r . (Note that terms of the form q 2 (1 − q) r , which arise from the quadratic term in (3.4), have to be written as q(1 − q) r − q(1 − q) r+1 .) This gives For r = −1, we deduce with the last part being discarded as it is singular at y = Y 0 . Note that b 0 happens not to depend on Y 0 , but all the others generally will. Thus, for r ≥ 0, When y = Y 0 , this is just 1/θ (r + 2) × r.h.s.(3.6). The lower limit in the integral must be Y 0 as otherwise b r+1 becomes singular at y = Y 0 in a similar manner to (3.5). Successive terms may be extracted recursively; and fortunately, just as with the WKB expansion, the (differential) equation for b r+1 is first-order linear, even though it is nonlinear in the 'known' terms (b j ) r j=0 . Thus, the recurrence involves no more than a succession of integrals.

(b) The parameter θ
The parameter θ allows us to expand the FP equation at hand around that of any of a oneparameter family of OU equations in which θ governs the reversion speed. We want to know what is the best θ to use for any given A, and intuitively it is clear that one should use a θ that best approximates, in some sense, the (y-dependent) rate of mean reversion that A gives. In so doing, we will have matched not just the short-and long-time behaviour of the function h, but also the rate of transition from one regime to the other.
The bound states of (1.1) are the normalizable eigenfunctions of L † . In the OU case with Y 0 = 0, it is plain from the Mehler expansion (1.11) that the eigenfunctions ψ r (y) and associated eigenvalues λ r are Importantly, up to normalization one has ψ r+1 = ψ r . Now in the general case we know ψ 0 , which is the invariant density, and λ 0 = 0, but we do not know any of the other eigenfunctions or eigenvalues. However, let us assume that the first eigenfunction is approximately ψ 0 , and writeψ 1 = ψ 0 for this approximated eigenfunction. Then Let us assume this to be roughly λ 1ψ1 , i.e. λ 1 ψ 0 . Then integrating once, we have A ψ 0 ≈ λ 1 ψ 0 , and then integration from −∞ to ∞ gives given that in the OU case λ 1 = −θ , we use This has considerable intuitive appeal, as −A is, in a sense, the average rate of mean reversion, and by construction is necessarily positive. Of the 'OU deformations', it is easily calculated in two of the four cases, but (1.5) requires an integral, to which we provide the first term in a Padé approximation: In the double-well example, with Φ the cumulative Normal distribution function. The above discussion gives an unambiguous prescription for θ, but it should not be thought that the value is critical. For example, one can expand the OU model A(y) = −θ * y using any θ > 0, thereby picking up a series of correction terms that arise from the inequality of θ and θ * . It is easily verified that the resulting series is convergent for |1 − q| < 1, i.e. all τ ≥ 0.

(c) Initial results; the remainder term
For a numerical demonstration, we setγ = 1 2 , i.e. ν = 5. Figure 5 shows the functions b r (y) for r = 0, 1, 2, . . . for each of the three models ((1.5)-(1.7)), and with Y 0 = 0, −2 in each. (From (3.10) we have θ = 0.727, 0.50, 1.04, respectively.) Apparently, the partial sums converge at power law in r, i.e. b r (y) ∝ r −λ , as is corroborated by figure 5c. This is consistent with h Y having a singularity of the form q λ at q = 0, which in turn implies exponential decay in t as t → ∞ (which seems reasonable, given our previous discussion on spectral theory). There is no universal scaling law, so λ depends on the problem at hand.
An estimate of λ, even if rudimentary, can allow us to estimate the discarded part of the summation, at least to the extent that we get a better estimate than assuming it to be zero. Using the result given in appendix A, the assumption b r (y) ∝ r −λ leads to the approximation In the examples we have investigated, we have found λ to lie between 0.2 and 0.6. One can use an empirical estimate, but we suggest using λ = 1 2 throughout-a principle to which we will have recourse later-and applying (3.12) when N 5.
We have also computed the partial sums h Y with the solution obtained from a PDE solver to satisfy ourselves that they converge to the correct result. Wherever we refer to a PDE solver, we have used the method of lines with finite differences in space and DASSL (see [29]) as the time solver; this is a well-established, highly effective, and often used approach for numerically solving potentially highly nonlinear and stiff diffusion-like equations in fluid mechanics and elsewhere [30]. To compute h Y , we compute g Y and then take the logarithmic derivative, rather than solving the nonlinear PDE (2.2) directly.

(d) Derivation of the normalizing factor
We now know h Y (τ , y), but this only allows us to reconstruct g Y (τ , y) up to an arbitrary multiplicative time-dependent factor n Y (τ ) say, which we must now obtain:  (The lower limit of the integral is arbitrary, and taken as Y 0 for convenience.) Inserting this into (2.1) gives a first-order linear differential equation for n Y : and its solution must be, as n Y (∞) = 1, Note that the r.h.s. seems to depend on y, but does not actually do so, because h Y obeys (2.2). Thus any y can be chosen, and using the same value as the lower limit of the y-integral causes the dy term to vanish: Substituting (3.1) into (3.14) and performing the time integral (note dτ = −dq/2θq) gives where the coefficients (β r ), (β r ) are defined by Remember, as emphasized by the · | Y 0 notation in (3.16), that the functions (b r ), for r ≥ 1, depend parametrically on Y 0 . Thus, if Y 0 is altered then one must recalculate all the (b r ) from r = 1 upwards, using (3. whereas B( 3 2 , r; q) requires a recurrence: In the OU case, A(y) ≡ −θy, the only effective terms in (3.15) are the (1 − q) −1/2 prefactor and the first exponential term, with the Y 2 0 in it; this is as it should be. In the special case where A is an odd function, as it is with all of our examples, and Y 0 = 0 also, this simplifies to The * symbol signifies the following: when r = 0 the Beta function B(a, r) is undefined, but B(2, r) − B( 3 2 , r) is taken to mean the value of B(2, r; q) − B( 3 2 , r; q) in the limit q 1, which is well defined and in fact equals 1 − 2 ln 2. But by direct analysis of the FP equation, we find By comparison of these two results, we are able to establish two things, or, rather, one thing that can be used in two ways. First, we have an infinite product representation of the invariant density at the point Y 0 : As we said above, altering Y 0 requires a complete recalculation of the (β r ). But in fact only one point is needed to determine f Y (∞, y) for all y, since In the symmetrical case where A is an odd function, it is obvious to choose Y 0 = 0, and then (3.18) simplifies to However, f Y (∞, y) is often known, as we said earlier with (1.5)-(1.7), or else can be calculated as an exponential integral followed by normalization to unit probability mass. The second idea, therefore, is to reverse the previous logic and use f Y (∞, y) to obtain ρ N (0): (3.20) or, when A is odd, The first term in either of the above is the logarithm of the quotient of the Normal distribution (mean 0, variance 1/θ ) and the true invariant density. We can use this (exact) expression for ρ N (0) to sharpen the approximation in (3.15), thereby making it match at τ = 0; recall that it already does so at τ = ∞. Judging from (3.17) and the discussion surrounding  r=1 r −λ−1 (1 − q) r defines an analytic function of q that is of order q λ as q 0, by the Tauberian theorem. Accordingly, we are motivated to write ρ N (τ ) = ρ N (τ ) + q λ ρ N (0). (3.22) Thus, ρ N (0) = ρ N (∞) = 0 for all N. As we can easily compute ρ N (0), we reduce the truncation error in (3.15) from ρ N (τ ) to ρ N (τ ). As any λ > 0 will exactly remove the truncation error at τ = 0, we are free to choose λ = 1 2 throughout, as previously intimated.
(e) Leading-order expansion We are now ready to put everything together, and in the first instance we use no correction terms, thereby ignoring all the (b r ). We have in this approximation, by (3.1), and by (3.15), (3.20) and (3.22), Multiplying these two to get g Y (τ , y), and then by f Y (∞, y), gives us what we are after: In effect, this is the invariant density multiplied by a Gaussian of time-dependent width and centre; we recall that θ comes from (3.9) and (3.10), and we are standardizing on λ = 1 2 . We are now in a position to explore the efficacy of the expansion scheme versus full numerical solutions and we proceed to do so in figures 6-8. Each shows a numerical simulation of the full PDE, using the same parameters as in §3c, i.e.γ = 1 2 , for Y 0 = 0, −2; the shift in the source position illustrates that, in each case, the solution drifts back to the origin as it simultaneously diffuses outward. In each case, we illustrate for f , on log axes to accentuate any error, the numerical solution of the PDE for τ = 0.1, 1, 5 versus the N = 0 (just the leading-order) solution (3.23). The solutions are virtually indistinguishable. As a more stringent, and demanding, test upon the methodology the double-well example (1.9) is also evaluated both numerically and via the expansion; the results shown in figure 9 are again remarkably accurate, particularly considering that it is just the N = 0 approximation that is shown.

(f) Higher order development
We can, however, pursue the approximation to higher order in an attempt to squeeze the error down further. This we illustrate by plotting the difference between the numerical solution for the density and the approximation (summarized by equations (3.1), (3.13), (3.15), (3.20) and (3.22)) in figure 10. The numerical solution is evaluated using a highly accurate implicit time solver and uses central differences that are accurate to O(δy 3 ) ∼ O(10 −6 ), where δy is the gridspacing that is 10 −2 in the simulations shown; thus the observed difference of O(10 −6 ) is to be expected. Figure 10 shows the N = 0 approximation together with increasing values of N (2, 5, 10, 20, 50) and it appears that the error converges to zero as N → ∞. For reasons of space, we just show the results for model (1.5), but the picture is similar for the other cases.  The integrals in (3.7) are algebraically intractable in general, and by way of numerical techniques we suggest the use of Chebyshev approximation. Set up an interval [y min , y max ], approximate b r at the Chebyshev nodes (y k ), which allows its derivatives to be evaluated at any point as a function of the expansion coefficients [11]; then evaluate at each k the integral (3.7) by Gauss-Legendre quadrature (which, apart from the term containing A, will be exact provided the quadrature order is ≥ 1 2 r * + m − 1 2 , where m is the degree of the Chebyshev approximation, 2 and r * is the highest value of r desired in (3.7)), and use these values b r (y k ) as the samples of an interpolating Chebyshev approximation. The recurrence is initialized by approximating b 0 . Incidentally, this method was used to generate the results in figure 5 using degree 21 on the interval [−10, 10]. Note that y → h Y (τ , y) is approximated as a polynomial, for each τ ; the integral in (3.13) can be done by Gaussian quadratures again or directly from the Chebyshev expansion of the (b r ).

(g) Divergent (unstable) diffusions
An intriguing question is whether the methods described here also work for 'unstable' (nonreverting) diffusions, i.e. those with no invariant density. Hongler & Desai [31] point out that the so-called repulsive Wong model admits a closed-form solution in one isolated case. This is whose FP equation solves for α = 1 as Now the above diffusion is not reconcilable with (1.7), on account of the sign in the drift being wrong. Inasmuch as it is related to the OU process, one has to take (1.4) with negative mean reversion (κ = −1, σ = 1, γ = 1), and then transform by X = sinh 1 2 Y. There is, of course, no invariant density. There is an unphysical steady-state solution to the FP equation, which is ∝ cosh 2 y. However, division of f Y by this to get the solution to the adjoint forward equation does not give a solution that tends to 1 as t → ∞, so the methods given in this paper, and in particular (3.23), fall apart. Note that the form of the solution is two diverging blobs of probability mass. This is important: whereas mean-reverting diffusions do look like the standard OU process, allowing the solution to the FP equation to be expanded around the solution of the OU process, divergent ones may exhibit structural features particular to themselves and cannot be regarded in the same way.
That said, the quantity (−∂/∂y) ln f Y is simple,  and it does have a large-time limit, − tanh y. But this limit is not obtainable from the logarithmic derivative of the unphysical steady-state solution, as that is − ∂ ∂y ln cosh 2 y = −2 tanh y.
We may fairly state the following conclusions: (i) the expansion shown here is inapplicable to divergent diffusions; (ii) an alternative form of expansion of the logarithmic derivative of the density may well prove useful; and (iii) it is an interesting avenue for further research.

(h) Convergence
The series (3.1) is derived as an expansion in the limit t → 0. Thus, it may or may not be convergent in the classical sense, i.e. lim N→∞ R N (τ , y) = 0 for each τ , y.
For the examples we have considered, the empirical evidence is that b r (y) asymptotically decays at power order in r as r → ∞, and this is sufficient for the purpose. Clearly, some conditions must be obeyed by A for convergence to hold. While the N = 0 approximation (3.23) is generally applicable, the usefulness of higher order terms is predicated on differentiability of A, because, as the order of the approximation (N) is increased, successively higher derivatives of A are invoked, on account of the b r term. It seems likely that a necessary condition on A is that it be analytically continuable to the open strip |Im y| < η, for some η > 0. What further conditions are required for h Y to have the posited Laurent expansion (3.1) convergent in the punctured disc |q − 1| < 1 is a matter for further research.  We reiterate that, whether the series is classically convergent or not, the error is uniformly bounded in τ , by contrast with an expansion in powers of τ .

(i) Fusion with existing techniques
We argue that it is possible to combine the best features of the series expansion shown here with those of classical methods of solving PDEs. If (3.1) is non-convergent, it will be impossible to obtain arbitrary accuracy (in general, this is a perennial problem with asymptotic expansions: they give excellent accuracy in certain regions but cannot achieve arbitrary accuracy at any given point)-and, even if it does converge, its convergence may be inconveniently slow. Classical techniques and the existence theorems that relate to them are not reliant on analyticity, but can potentially obtain arbitrary accuracy. (From a more general context: on a compact interval any continuous function may be approximated to arbitrary accuracy by a polynomial or a Fourier series-but neither constitutes any sort of power series expansion of the function, as it may well not be analytic. In approximation theory, one does not want to be forced to assume analyticity of the function being approximated.) Consider an approximate solution f (τ , y), such as the N = 0 approximation though any other N would do, and define the relative error ψ Y via the substitution f Y (τ , y) = f Y (τ , y) e ψ Y (τ ,y) , then writing down the PDE obeyed by ψ Y . As we have extracted the τ = 0 and τ → ∞ behaviour correctly in f Y , the initial condition is ψ Y ≡ 0 (and we expect ψ Y to vanish as τ → ∞). The resulting equation for ψ Y , though nonlinear, can still be solved by grid-based methods or by spectral methods. But whereas, before, we said that spectral methods have difficulty coping with a singular initial condition, they are now being applied to the slowly varying function ψ Y , and so their performance will be much better. In essence, as far as numerics are concerned, all the hard work has been done by extracting f Y . This idea has been used in different guises for many decades. For example in [21], it is common to see special functions approximated similarly. After studying the expansion's behaviour and/or singularities, apply an appropriate transformation to take these out and map the domain to [−1, 1]. The transformed function, which by construction is slowly varying and properly behaved at the endpoints, is then ideally approximated as a Chebyshev series-and this is the kind of problem for which spectral methods are ideal.

Conclusion
We have shown how to expand the solution of the forward and backward equations of modified OU processes 'around' the OU case in a way that respects the characteristics of the problem. The method generalizes to other processes, and our conclusions are summarized as follows.
-It is more convenient to work in normalized coordinates, hence the transformation from X to Y and the time change from t to τ . By this transformation, we can assume that the volatility term is constant. Throughout, q = e −2θτ so that q ∈ [0, 1] and we expand in q(1 − q) r . The suggested value of θ is that given by ( -The infinite product expansion of g Y is given by (3.13) and (3.1). If only the logarithmic derivative (w.r.t. y) is needed, then this is just −h Y , so n Y can be ignored. -The functions b r are obtained by the recurrence (3.5) and (3.7), using also (3.2). -The function n Y is given by (3.15), and the remainder term ρ N (τ ) is decomposed by (3.22) into a part √ qρ N (0) that can be evaluated by (3.20) and a pared-down remainder term ρ N which is dropped. -The case with no correction terms, N = 0, is given by (3.23).
In terms of computation, we have found that this method improves upon the use of a standard PDE solver. One reason for this is that, in common with spectral methods, it gives arbitrary spatial resolution, whereas to get a spatial resolution of O(δx) using a PDE solver with, say, standard finite differences typically requires a time step of O(δx) 2 . We have found that the approximation, with a handful of correction terms, typically represents an improvement in speed of a couple of orders of magnitude over the PDE solver; using N = 0 is obviously even faster.

Further developments
This section discusses possible avenues for further research, which we think are numerous and varied.
-Multi-dimensional analogues. These would require the expansion of a multi-dimensional problem around its associated multi-dimensional OU ansatz. -Time-varying analogues. In principle, the logarithmic derivative of the associated FP equation has a useful structure even when the coefficients of the underlying stochastic differential equation are time-varying, but it is less clear how to proceed as a steady-state solution can no longer be identified: thus one might have to work directly with the FP equation rather than its adjoint. -If the process is strictly positive, as happens in certain financial models, then the natural ansatz is no longer the OU process, but instead the square-root process for which the FP equation has a known solution (e.g. [32]). This would allow such processes as dX t = (a − κX t ) dt + σ X t (1 + γ X t ) dW t to be analysed. A further development is when the process is bounded on both sides, such as dX t = (a − κX t ) dt + σ 1 − X 2 t dW t . -The representation of the solution to the forward equation arising from a Lévy process. -Barrier problems. These fall into two broad categories. One possibility is to solve the FP equation with a delta-function initial condition, in the presence of one or more absorbing barriers. In that case, an infinite product solution can be thought of as a term corresponding to the initial condition, a set of terms that vanish on the boundary(ies), and the exponential of a remainder series. An alternative is to solve the backward equation, giving the density of the stopping-time conditionally on starting from some point X 0 . This would allow insight to be gained into such problems as the first passage time of an OU process through a barrier: no simple solution is available, but approximate analytical forms are known, and an infinite product expansion should allow the errors in these to be quantified and corrected in a sensible way.
Ethics. This work did not involve any research on humans or animals. Data accessibility. The datasets supporting this article have been uploaded as part of the electronic supplementary material.