Bayesian inference for diffusion processes: using higher-order approximations for transition densities

Modelling random dynamical systems in continuous time, diffusion processes are a powerful tool in many areas of science. Model parameters can be estimated from time-discretely observed processes using Markov chain Monte Carlo (MCMC) methods that introduce auxiliary data. These methods typically approximate the transition densities of the process numerically, both for calculating the posterior densities and proposing auxiliary data. Here, the Euler–Maruyama scheme is the standard approximation technique. However, the MCMC method is computationally expensive. Using higher-order approximations may accelerate it, but the specific implementation and benefit remain unclear. Hence, we investigate the utilization and usefulness of higher-order approximations in the example of the Milstein scheme. Our study demonstrates that the MCMC methods based on the Milstein approximation yield good estimation results. However, they are computationally more expensive and can be applied to multidimensional processes only with impractical restrictions. Moreover, the combination of the Milstein approximation and the well-known modified bridge proposal introduces additional numerical challenges.


Introduction
Many areas of science employ diffusion processes as a powerful tool to model continuous-time dynamical systems that are subject to random fluctuations.A diffusion process can be equivalently described by a stochastic differential equation (SDE).
If the SDE yields an analytical solution, the transition densities of the corresponding diffusion process are explicitly known and the parameter estimation can be performed easily through a maximum likelihood approach as shown in [1].However, in the majority of applications, this is not the case and the transition densities are intractable.
When the transition densities are unknown, another issue for the parameter estimation is the type of data that is available.In practice, a process can only be observed at discrete points in time.A comprehensive overview of methods for parameter inference from high-frequency data, i.e., that inter-observation times are small, can be found in [2, chapter 6].For the parameter estimation from low-frequency observations, Markov chain Monte Carlo (MCMC) techniques have been developed that introduce imputed data points in order to reduce the time steps between the data points.This concept of Bayesian data imputation for the inference of diffusions has been utilised and further developed by many authors such as [3], [4], [5], and [6].
The idea of these MCMC algorithms is to construct a Markov chain whose elements are samples from the joint posterior density of the parameter and the imputed data points conditioned on the observations.This construction is achieved via a Gibbs sampling approach by alternately executing the following two steps: (1) to draw the parameter conditional on the augmented path that consists of the observed data points and the imputed data points and (2) to draw the imputed data points conditional on the current parameter and the observed data points.
In both steps, direct sampling from the corresponding conditional distribution is usually not possible, therefore, a Metropolis-Hastings algorithm is applied.The (full conditional) posterior densities are reformulated as the product of the transition densities of the process in both steps and the prior density of the parameter in the first step.
Because the transition densities are intractable, they can only be approximated numerically.
The numerical approximation of the transition densities of the process is not only necessary for the calculation of the posterior densities, but also for the proposal of the imputed data points.In both contexts, the Euler-Maruyama scheme is the standard approximation technique in literature, including all of the references hitherto mentioned.In order to reduce the amount of imputed data and the number of necessary iterations for the computationally expensive estimation method, one idea is to employ higher-order approximation schemes.
Therefore, we investigate the utilisation and usefulness of such higher-order approximations on the example of the Milstein scheme.A closed form of the transition density based on the Milstein scheme was derived in [7].In [8], this closed form was used to estimate the parameters of a hyperbolic diffusion process from high-frequency financial data, but not in the context of Bayesian data augmentation.For the latter, [3] claim the possible use of the Milstein scheme.However, the specific implementation and benefit in this framework, in particular when using sophisticated proposal methods, remain unclear and therefore are at the focus of this work.
This article is organised as follows: In Section 2, we define diffusion processes and explain how their paths can be numerically approximated and how the transition densities of the process can be derived based on these approximations.The parameter estimation methods for diffusion processes using Bayesian data augmentation and the approximated transition densities are elaborated in Section 3 and we give some comments about our implementation of these methods in Section 4. Afterwards, we explain the set up of our simulation study in Section 5 which is followed by the results and a discussion in Sections 6 and 7.

Approximation of the transition density of a diffusion process
We consider a d-dimensional time-homogeneous Itô diffusion process (X t ) t≥0 which is a stochastic process that fulfils the SDE with state space X ⊆ R d , starting value x 0 ∈ X and an m-dimensional Brownian motion (B t ) t≥0 .The model parameter θ ∈ Θ is from an open set Θ ⊆ R p .Moreover, we assume that the drift function µ : X ×Θ → R d and diffusion function σ : X ×Θ → R d×m fulfil the Lipschitz condition and growth bound to ensure that Equation (1) has a unique solution [see e.g. 9, chapter 5].
One example of such a diffusion process is the geometric Brownian motion (GBM) that is described by the SDE with state space X = R + , starting value x 0 ∈ X and the two-dimensional parameter θ = (α, σ) T , where α ∈ R and σ ∈ R + and R + is the set of all strictly positive real numbers.In this work, we use the GBM as a benchmark model because of its convenient property of having an explicit solution.The stochastic process fulfils Equation ( 2) for all t ≥ 0. Hence, the multiplicative increments of the GBM are log-normally distributed for t ≥ s ≥ 0 and the transition density is explicitly known to be A derivation of the solution of the GBM and its transition density can be found e.g. in [10].

Approximation of the solution of an SDE
Unlike the GBM, most SDEs do not posses an analytical solution and, thus, their transition densities are not explicitly known.Instead, numerical approximation schemes for the solution of SDEs are used.Kloeden und Platen [11] give a detailed description of these methods.The most commonly used approximation is the Euler(-Maruyama) scheme that approximates the d-dimensional solution (X t ) t≥0 of an SDE by setting Y 0 = x 0 and then successively calculating where ∆t k = t k+1 − t k , ∆B k = B tk+1 − B tk , and Y k is the approximation of X tk , for k = 0, 1, 2, . . . .The approximation improves as the time step ∆t k decreases.The Euler scheme contains only the time component and the stochastic integral of multiplicity one from the stochastic Taylor expansion of the process (X t ) t≥0 and has the order of strong convergence of 0.5.By adding another term of the stochastic Taylor expansion, one obtains the Milstein scheme that approximates the d-dimensional process (X t ) t≥0 by setting Y 0 = x 0 and then successively calculating for the i th component for k = 0, 1, . . .and i = 1, . . ., d. Whenever σ (Y k , θ) is constant in Y k , the last term vanishes and the Milstein scheme reduces to the Euler scheme.If µ (Y k , θ) is once and σ (Y k , θ) is twice continuously differentiable w.r.t.Y k , then the Milstein scheme is strongly convergent of order 1.0 which is higher than that of the Euler scheme.However, there is a severe restriction on the practical applicability of the Milstein scheme due to the fact that the stochastic double integral in the last term only yields an analytical solution for j = l.Although approximation techniques for the double integral exist [see e.g.11], these are unsuitable for our purposes.On the one hand, we want to avoid adding yet another layer of approximation and thus additional computational effort.On the other hand, we need to find the distribution of Y k+1 based on the approximation schemes ( 5) and ( 6) which is also impossible explicitly when adding another approximation.For this reason, we focus on models where the double integral appears exclusively w.r.t. to the same components of the Brownian motion.This is the case when the process is driven by a one-dimensional Brownian motion, i.e. that the diffusion function σ (Y k , θ) is of dimension d × 1.Hence, the diffusion model includes only one source of noise which may affect each of the components of the process.More generally, we require that  ∂σ il ∂y (r) (Y k , θ) ≡ 0, for r = i.Then the i th component of the approximated process reads for k = 0, 1, . . .and where j is the column index of the one non-zero entry of the i th row of the diffusion function.This in turn means that each of the components of the process (X t ) t≥0 is only affected by one of the components of the Brownian motion and the size of the effect is either constant or depends only on the respective component of the diffusion process.In many applications this is not a realistic assumption.
Since our example, the GBM, is a one-dimensional process, the double integral vanishes and the Milstein scheme for the GBM yields for k = 0, 1, . . ., where the first three summands also correspond to the Euler scheme.Figure 1 illustrates the two approximation schemes.It shows three trajectories of the GBM which are represented by the red points and were simulated by setting a seed for the random number generator and then sampling from the explicit transition density (4).The same seed was used to sample the increments of the Brownian motion from the normal density and then transform them by equation ( 5) and ( 6) to obtain the Euler (black) and the Milstein (blue) approximation of the trajectories.We observe that the Milstein approximation is in almost all cases closer or at least as close to the points of the trajectories as the Euler approximation.

Transition densities based on the approximation schemes
While sampling diffusion paths is quite straightforward for both approximation schemes as described above, writing down the corresponding transition density is less apparent for the Milstein scheme.Since the Euler scheme is a linear transformation of ∆B k ∼ N 0, √ ∆t k I m , where I m denotes the m-dimensional identity matrix, the transition density derived from the Euler scheme is also a multivariate Gaussian density: where φ (y|a, b) denotes the multivariate Gaussian density with mean a ∈ R d and covariance matrix b ∈ R d×d .
For the Milstein scheme, deriving the transition density is more complicated, even in case of a one-dimensional diffusion process which we will consider here.Elerian [7] derives the transition density by first rearranging the Milstein scheme to obtain a transformation of a non-central chi-squared distributed variable for which the density is known and then applying the random variable transformation theorem.We show an alternative derivation that directly applies the random variable transformation theorem to ∆B k in Appendix A. Both approaches give the same result.For simplicity of notation, we set µ k := µ (Y k , θ), σ k := σ (Y k , θ), and σ ′ k := ∂σ (y, θ) /∂y | y=Yk .Then, the transition density based on the Milstein approximation for a one-dimensional diffusion process reads and Due to the square root, A k (Y k+1 ) must be non-negative, otherwise the transition density is equal to zero.Hence, there is a bound on the support of π M il .Whether this is a lower or an upper bound depends on the sign of the diffusion function and its derivative.Moreover, one can show that the value of the transition density tends to infinity as Y k+1 approaches the bound.However, the interval for which the density increases towards infinity may be arbitrarily narrow depending on the parameter setting.
For the GBM, we have σ (X t , θ) = σX t with the parameter σ > 0 and the process takes values in R + .Therefore, we obtain a lower bound for the possible values of Y k+1 : Depending on the parameter combination θ = (α, σ) T , this lower bound may be negative and in that case the support of the transition density includes the whole state space of the GBM.
In Figure 2, we illustrate the transition densities based on the solution of the GBM, on the Euler scheme, and on the Milstein scheme for two different parameter settings.We observe that the Milstein transition density better approximates the mode of the transition density of the solution then the Euler transition density does.On the other hand, while the support of the Euler transition density is the set of all real numbers, the Milstein transition density puts zero weight on the values of Y k+1 that are below the lower bound even though some of those would be feasible according to the transition density of the solution process.

Baysian Data Augmentation for the Parameter Estimation of Diffusions
Having low-frequency observations X obs = (X τ0 , . . ., X τM ) of the process (X t ) t≥0 that is described by the SDE (1), we want to estimate the parameter θ.In this work, we assume that all observations are complete, i.e. there are no latent or unobserved components for all observations, and that there is no measurement error.The approximation schemes for the solution of the SDE as introduced in the previous section are only appropriate for small time steps.Thus, we introduce additional data points X imp at intermediate time points and estimate the parameter θ from the augmented path X obs , X imp .To that end, a two-step MCMC approach is used to construct the , of which the elements are samples from the joint posterior distribution π θ, X imp | X obs : Step (1) Parameter update: For a general introduction to MCMC methods see e.g.[12].The mean of the resulting MCMC chain θ (i) i=l+1,...,L after discarding the first l elements as burn-in is then used as a point estimate for the parameter θ.The two steps of the algorithm are described in detail in the next two subsections.We use π to denote the exact densities of the process, i.e. the (full conditional) posterior densities as well as the transition densities.The meaning will be clear from the arguments.Approximated densities are indicated by a corresponding superscript.

Parameter Update
In Step (1), a parameter proposal θ * is drawn from a proposal density q θ * | θ, X obs , X imp which may or may not depend on the imputed and observed data.If it depends only on the current parameter value θ, this is called a random walk proposal.The proposal θ * is accepted with probability and otherwise the previous value θ is kept.Due to Bayes' theorem and the fact that a diffusion process has the Markov property, the (full conditional) posterior distribution can be represented as where π X tk+1 | X tk , θ denotes the transition density of the process (X t ) t≥0 , n the total number of data points in the augmented path, and p the prior density of the parameter.We choose a random walk proposal where the r components of θ * that take values on the real line are drawn from the normal distribution N (θ j , γ 2 j ), for j = 1, . . ., r and some predefined γ j ∈ R + , and the (remaining) strictly positive components are drawn from a log-normal distribution LN (log θ j , γ 2 j ), for j = r + 1, . . ., p.In this case, the acceptance probability reduces to: [see 2, Chapter 7.1.3].
The transition density π X tk+1 | X tk , θ is usually not explicitly known, but can be approximated by the Euler or Milstein scheme as described in Section 2.

Path Update
Since a diffusion process has the Markov property, the likelihood function of the parameter θ factorises as ? ????Hence, it is sufficient to consider the imputation problem in Step (2) only for one path segment between two consecutive observations X τi and X τi+1 .As illustrated in Figure 3, the time interval between the two observations is divided into m subintervals, such that the end points of these intervals are and the time steps are ∆t k = t k+1 − t k , for k = 0, . . ., m − 1.We denote the observations by X obs {τi,τi+1} = {X τi , X τi+1 } and the imputed data points by X imp (τi,τi+1) = {X t1 , . . ., X tm−1 },.After initializing the imputed data by linear interpolation, the path is updated using the Metropolis-Hastings algorithm.A proposal X imp * (τi,τi+1) is drawn from a distribution with density q which may depend on the observed data, the current imputed data, and the parameter θ and accepted with probability , θ q X imp * (τi,τi+1) X imp (τi,τi+1) , X obs {τi,τi+1} , θ .
(10) Otherwise, the proposal is discarded and the previously imputed data X imp (τi,τi+1) is kept.Again, due to the Markov property, we have where X * t0 = X t0 = X τi and X * tm = X tm = X τi+1 and π X tk+1 | X tk , θ denotes the transition density of the process (X t ) t≥0 .
The challenging part of the path update step is the question of how to propose new points.The simplest approach uses the (approximated) transition density to propose a new point by conditioning only on the point to the left of it.We call this proposal method the left-conditioned proposal (LC) and illustrate it in Figure 4a.The proposal density of an entire path segment is simply the product (e) Modified bridge Euler proposal where X * t0 = X τi and thus, the acceptance probability reduces to where X * tm = X tm = X τi+1 .Here, the transition density again is approximated by the Euler or Milstein scheme from Section 2.
This proposal strategy considers the information from the observation X τi on the left, but the proposed path segment is independent of the observation X τi+1 on the right.This may lead to a large jump in the last step from X tm−1 to X τi+1 (as can be seen in Figures 4c and 4d) and hence, to an improbable transition.Therefore, the acceptance probability for the left-conditioned proposal X imp * (τi,τi+1) and hence the acceptance rate of the MCMC sampler is usually low.
Another proposal strategy is the modified bridge proposal (MB) which conditions on both the previous data point and the next observation on the right, as visualised in Figure 4b.This strategy was first proposed by [13].Again, the proposal density of an entire path segment factorises as where X * t0 = X τi , and we apply Bayes' theorem and the Markov property to rewrite the left-and right-conditioned proposal density of one point as for k = 0, . . ., m − 2.
[13] suggest to approximate the two transition densities on the right hand side by the Euler scheme and further approximate µ X * tk+1 , θ and σ X * tk+1 , θ by µ X * tk , θ and σ X * tk , θ .This way, we obtain again a Gaussian density: where Σ X * tk , θ = σ 2 X * tk , θ and φ was defined in Section 2.2.Now, we consider the Milstein approximation for the two factors on the right hand side of (11).The first factor looks just like the Milstein transition density stated in Section 2.2.With the same notation , ∆ + = t m − t k+1 , and t m = τ i+1 , the second factor reads for E m (X * tk+1 ) ≥ 0 (which cannot be rearranged for X * tk+1 in general), otherwise the density is equal to zero.µ * k+1 and σ * k+1 are like µ k+1 and σ k+1 with X tk+1 replaced by X * tk+1 .Here, approximating µ k+1 and σ k+1 by µ k and σ k does not lead to any simplification; therefore, we refrain from doing so.Moreover, there is no closed formula for the normalisation constant needed to scale the product of the two transition densities to a proper density.
For the GBM, we have X t > 0 and σ * k+1 = σX * tk+1 > 0 and thus, obtain the following bounds for π M il X tm |X * tk+1 , θ : and From (7), we obtain the following lower bound for π M il X * tk+1 | X * tk , θ : At the same time, proposals X * tk+1 for the GBM should always be strictly positive to be in the state space.Let l := max{0, l 1st }.Together, Equation (11) leads to three cases for the set D of feasible points of X * tk+1 for the GBM (assuming X tm > 0): if (Case II) or (Case III) apply.
Since the modified bridge proposal does not only take into account the information from the left data point but also from the observation on the right, there is no such large jump in the last step as for the only left-conditioned proposal.This is also apparent in the simulations in Figures 4e and 4f.Therefore, the acceptance probability and thus the acceptance rate are usually higher for the modified bridge proposal than for the left-conditioned proposal.As we show in Appendix B, the acceptance probability is even equal to 1 for the modified bridge proposal if only one data point is imputed between two observations (i.e. the number of inter-observation intervals is m = 2).This holds when using the Milstein scheme in the approximation of the transition density for the likelihood density and the proposal density, but also when using the Euler scheme without the approximation of µ k+1 and σ k+1 by µ k and σ k .
So far, our path update was only applied to imputed points between two observations.It can easily be extended to the case with several observations along the path by simply decomposing the path into independent path proposals, multiplying the respective acceptance probabilities and collectively accepting or rejecting the proposals.Moreover, the whole path does not have to be updated all at once, but can be divided into several path segments that are successively updated.Different algorithms for the choice of the update interval are summarised in [2] and we describe one of them in Appendix C.
In total, we have described eight parameter estimation methods, as we have three choices with two options each: (1) approximate the transition densities in the likelihood function based on the Euler or the Milstein scheme, (2) use the left-conditioned or the modified bridge proposal, and (3) use the Euler or the Milstein scheme for the proposal densities.
To our knowledge, we are the first to utilise the Milstein scheme in the above described MCMC context.

Implementation
The implementation is rather straightforward for most of the estimation procedures.Only the combination of the modified bridge proposal and the Milstein approximation requires some more detailed explanation.As already mentioned, when approximating the two factors on the right hand side of (11) by the transition density based on the Milstein scheme, there is no closed formula for the normalisation constant to obtain a proper density.The normalisation is necessary because the proposal density for a path segment is the product of several of these terms where the condition of the left point X tk differs between a newly proposed segment and the last accepted segment if several consecutive points are imputed.Therefore, the normalisation constants differ and do not cancel out in the acceptance probability.Only for the case where just one point is imputed between two observations (i.e.m = 2 subintervals), the normalisation is not necessary because then the left point X tk is always a (fixed) observed point which is not updated and thus the normalisation constants cancel out in the acceptance probability.For m > 2, we integrate the product (11) over X tk+1 numerically to obtain the normalisation constant.The product (11) may be very small (but not zero everywhere in a non-empty feasible set D) and thus may numerically integrate to zero especially when the upper interval boundary of the feasible set is infinite.To overcome this issue, we take two measures.Firstly, we do not integrate over the whole set of feasible points but determine the maximum of the product numerically and then integrate over the interval that includes all points with a function value of at least 10 −20 times this maximum.Secondly, we rescale the product (11) by dividing by this maximum before integrating.
In order to sample from the Milstein modified bridge proposal density, we employ rejection sampling.For this, normalisation of the product ( 11) is not necessary.We again determine the maximum d max of the product numerically and also the interval I that includes all points with a function value of at least 10 −20 times this maximum.Then, we sample (u 1 , u 2 ) uniformly from the rectangle I × (0, d max ) and accept u 1 as a proposal X * tk+1 if the unnormalised density value of (11) at u 1 is at most u 2 .For the combination of the modified bridge proposal and the Milstein approximation, it may occur that the set of feasible proposal points is empty.In that case, our implementation switches to the Euler approximation for this one point.Besides, for all methods it may happen that a negative point is proposed which is not feasible for a GBM.Therefore, we propose a new point in that case.For both cases, we count how many times this occurs during the estimation procedure.In the following simulation study none of these cases occurred.
We implemented the described estimation procedures in R version 3.4.1 [14].The source code of our implementation and the following simulation study is available online. 1

Simulation study
For the simulation study, we generated 100 paths of the GBM using the solution (4) with the parameter combination θ = α, σ 2 T = (1, 2) T and the initial value x 0 = 100.Some of these paths are illustrated in Figure 5. From each path, we took 50 equidistant points and applied each of the eight described estimation methods once.For the prior distribution of the parameters, we assumed that they are independently distributed with α ∼ N (0, 10) and σ 2 ∼ IG(κ 0 = 2, ν 0 = 2), where IG denotes the inverse gamma distribution with shape parameter κ 0 and scale parameter ν 0 .The a priori expectations of the parameters are thus E(α) = 0 and E σ 2 = 2.Each of the estimation procedures performs the following steps: (1) Draw initial values for the parameters α and σ 2 from the prior distributions.
(3) Repeat the following steps 10 Output from one estimation procedure on the example of the combination of the modified bridge proposal and the Milstein approximation for the likelihood and the proposal density is shown in Figures 6 and 7. From each estimation procedure, we obtain a two-dimensional Markov chain for the parameters α and σ 2 which we use to calculate the mean and the mode of the chain after cutting off a 10% burn-in phase as point estimates for the parameters.As a benchmark, we also calculate the maximum likelihood (ML) estimate and the maximum a posteriori (MAP) estimate based on the solution of the GBM.
The estimation procedures and time measurements were performed on a cluster of machines with the following specifications: AMD Opteron(TM) Processor 6272 (2.10GHz), 128GB DDR3-RAM, 1333MHz.   .Estimated posterior densities for α and σ 2 from one parameter estimation run using the combination of the modified bridge proposal and the Milstein approximation for the transition and the proposal density.Moreover, true values of the parameters, the mean of the MCMC chains after 10% burn-in, the maximum likelihood estimate and the MAP estimate for the sample path based on the solution of the GBM are shown.

Results
Results of the parameter estimates for each of the 100 trajectories using each of the eight method combinations once are summarised in Figures 8 and 9 and Table 1.
For the parameter α, all of the estimation procedures yield similar results.It even seems that for the estimation of α, the data augmentation would not be necessary because also the results for m = 1, i.e. no points are imputed and only the parameter update step is performed in each iteration, already look very similar to the ML and the MAP estimates.Moreover, there is only little difference between the mean and the mode estimates.The mean estimates look slightly more similar to the ML and MAP estimate than the mode estimates do.
For the parameter σ 2 in Figure 9, the comparison between the estimation procedures also looks similar.However, here, the introduction of imputed points does seem to improve the estimation results as the shapes of the violin plots for m = 2 and m = 5 look more similar to those of the ML and the MAP estimates than the shapes of the violin plots for m = 1 do.Besides, the estimates using the modified bridge proposal and m = 5 are lower than the rest, independently of what approximation is used.
Table 1 displays some characteristics to evaluate the estimation procedures: • The multivariate effective sample size (ESS) of the parameters as defined in [15] gives the size of an independent and identically distributed sample equivalent to our MCMC sample in terms of variance.For m = 2, the multivariate ESS is higher than for m = 5 for each of the considered estimation procedures.It is also higher when comparing the procedures using the modified bridge proposal to those using the left-conditioned proposal.However, there is only little difference between the multivariate ESS when we compare procedures that use different approximation schemes, but the same proposal method and number m.The multivariate ESS seems to be only slightly higher when the same approximation scheme is used for both the proposal and the likelihood density.• For the acceptance rate of the parameters, the proposal method as well as the approximation scheme for the proposal density do not make a difference.On the other hand, the acceptance rate of the parameters is slightly lower when the Milstein scheme for the approximation of the likelihood density is used.Moreover, the acceptance rate of the parameters decreases as the number m of imputed points is increased.• The acceptance rate of the path hardly differs for the different approximation schemes when using the left-conditioned proposal method and the same number m of imputed points.It is substantially higher for the procedures that use the modified bridge proposal method.Unlike for the left-conditioned proposal, the acceptance rate of the path for the modified bridge proposal method increases as the number of imputed points increases.Besides, the acceptance rate of the path is higher for the modified bridge proposal when the same approximation scheme is used for both the proposal and the likelihood density.• The computation times vary substantially between the different estimation procedures.The procedures that use the Euler approximation are always faster.
Especially the combination of the modified bridge proposal method with the Milstein scheme approximating the proposal and the likelihood density takes very long.
We conducted another simulation study on the example of the Cox-Ingersoll-Ross (CIR) process as can be found in Appendix D. The results look very similar as for      Table 1.Empirical characteristics to evaluate the parameter estimation procedures for different numbers m of subintervals in between two observations aggregated over the 100 trajectories of the GBM.The total sample size used to calculate the ESS was 90000.Acceptance rates always take values between 0 and 1. Specifications for the computing power are stated in the main text.the GBM.There are hardly any differences between the estimates resulting from the different estimation procedures, and the computation time for the procedures which use the Milstein scheme for the proposal density are substantially higher.

Summary and Discussion
We have shown how to implement an algorithm for the parameter estimation of SDEs from low-frequency data using the Milstein scheme to approximate the transition density of the underlying process.Our motivation was to improve numerical accuracy and thus reduce the amount of imputed data and computational overhead.However, our results are rather discouraging: We found that this method can be applied to multidimensional processes only with impractical restrictions.Moreover, we showed that the combination of the modified bridge proposal with the Milstein scheme for the proposal density may lead to an empty set of possible proposal points in which case we would be forced to switch again to the Euler scheme in order to proceed.Our simulation study showed that the use of the Milstein scheme does not improve the estimation results.Besides, the estimation procedures using the Milstein scheme are computationally more expensive, especially the combination of the modified bridge proposal with the Milstein scheme for the proposal density.Our implementation may be improved to make it more efficient, i.e. to decrease the computation time.However, we doubt that the computation times can be reduced to that of the methods using the Euler scheme.Seeing that the estimation results do not substantially improve in the two investigated examples, this measure does not seem to be worth the effort.So we conclude with the insight that the current methods using the Euler scheme are, at least with respect to numerical analysis, already a reasonable choice.
Milstein scheme reads: The set D of feasible points of X tk+1 for the CIR process when combining the modified bridge proposal with the Milstein scheme is thus D = [l, ∞] with l := max (0, l lef t , l right ).
For the simulation study, we generated 100 paths of the CIR process on the time interval [0, 1] with the parameter combination θ = α, β, σ 2 T = (1, 1, 0.25) T and the initial value x 0 = 3 using the Euler scheme with step size ∆t = 10 −6 .From each path, we took 50 equidistant points and applied each of the eight described estimation methods once to estimate the parameters β and σ 2 and assuming α to be known.For the prior distribution of the parameters, we assumed that they are independently distributed with β ∼ IG (κ b = 3, ν b = 3) and σ 2 ∼ IG(κ s = 3, ν s = 4).The a priori expectations of the parameters are thus E(β) = 3  2 and E σ 2 = 2.For each estimation procedure, the steps as outlined in Section (5) are taken with 10 5 iterations .As proposal densities for the parameters in Steps (3a) and (3b), we used β * ∼ LN (log β i−1 , 0.25) and σ 2 * ∼ LN (log σ 2 i−1 , 0.25).The estimation results are summarised in Figures (D1) and (D2) and Table (D1).As for the GBM, there are hardly any differences between the estimates resulting from the different estimation procedures, and the computation time for the procedures which use the Milstein scheme for the proposal density are substantially higher.
Table D1.Empirical characteristics to evaluate the parameter estimation procedures for different numbers m of subintervals in between two observations aggregated over the 100 trajectories of the CIR process.The total sample size used to calculate the ESS was 90000.Acceptance rates always take values between 0 and 1. Specifications for the computing power are stated in the main text.

Proposal method
Proposal

Figure 1 .
Figure 1.Three trajectories of a GBM (2) with α = 1 and σ 2 = 0.25 and their approximations by the Euler and the Milstein scheme.

Figure 2 .
Figure 2. Transition densities for a transition from Y k to Y k+1 with a time step of ∆t k = 0.1 for two different parameter settings based on the solution of the GBM, on the Euler scheme, and on the Miltstein scheme, respectively.

Figure 3 .
Figure 3. Illustration of an augmented path segment: • represent observed data points and • represent imputed points

Figure 5 .
Figure 5. Illustration of the trajectories used in the simulation study: The solid black line shows the expected value of the solution of the GBM E [Xt] = X 0 exp(αt) = 100 exp(t).The colored lines are 10 examples of the 100 trajectories used in the simulation study.The grey-shaded area shows the range of the 100 trajectories.Each trajectory consists of 50 points that were used as observations.

Figure 6 .
Figure 6.Trace plots of the MCMC chains for the parameters α and σ 2 of the GBM (2) and of the logposterior density values for one parameter estimation run using the combination of the modified bridge proposal with m = 2 and the Milstein approximation for the likelihood and the proposal density.The red lines show the true values of the parameters α = 1 and σ 2 = 2, the blue solid lines the mean and the blue dashed lines the lower and upper bounds of the highest-probability density interval of 95% after cutting off 10% of the chains as burn-in which is represented by the green line.

Figure 7
Figure 7.Estimated posterior densities for α and σ 2 from one parameter estimation run using the combination of the modified bridge proposal and the Milstein approximation for the transition and the proposal density.Moreover, true values of the parameters, the mean of the MCMC chains after 10% burn-in, the maximum likelihood estimate and the MAP estimate for the sample path based on the solution of the GBM are shown.

Figure 8 .
Figure 8. Estimation results for α obtained by each of the eight described estimation procedures and the ML and the MAP estimates for comparison.Each violin plot represents 100 estimates, one for each of the 100 sample paths of the GBM.Moreover, results for different numbers m of subintervals in between two observations are shown.For m = 1, no data points were imputed and only Step (1), i.e. the parameter update, was repeated in the estimation procedure.

Figure 9 .
Figure 9. Estimation results for σ 2 as described in Figure 8.
order to have only j = l inside the double integral.This is the case if and only if σ (Y k , θ) has at most one non-zero entry per row i (namely in column j) and