Bayesian inference for diffusion processes: using higher-order approximations for transition densities
Modelling random dynamical systems in continuous time, diffusion processes are a powerful tool in many areas of science. Model parameters can be estimated from time-discretely observed processes using Markov chain Monte Carlo (MCMC) methods that introduce auxiliary data. These methods typically approximate the transition densities of the process numerically, both for calculating the posterior densities and proposing auxiliary data. Here, the Euler–Maruyama scheme is the standard approximation technique. However, the MCMC method is computationally expensive. Using higher-order approximations may accelerate it, but the specific implementation and benefit remain unclear. Hence, we investigate the utilization and usefulness of higher-order approximations in the example of the Milstein scheme. Our study demonstrates that the MCMC methods based on the Milstein approximation yield good estimation results. However, they are computationally more expensive and can be applied to multidimensional processes only with impractical restrictions. Moreover, the combination of the Milstein approximation and the well-known modified bridge proposal introduces additional numerical challenges.
Diffusion processes are used in many areas of science as a powerful tool to model continuous-time dynamical systems that are subject to random fluctuations. A diffusion process can be equivalently described by a stochastic differential equation (SDE).
If the SDE yields an analytical solution, the transition densities of the corresponding diffusion process are explicitly known and parameter estimation can be easily performed through a maximum likelihood approach, as demonstrated in . However, in the majority of applications, this is not the case, and the transition densities are intractable.
When the transition densities are unknown, another challenge for parameter estimation is the type of available data. In practice, a process can only be observed at discrete points in time. A comprehensive overview of the methods for parameter inference from high-frequency data (i.e. where inter-observation times are small) can be found in [2, ch. 6]. For parameter estimation from low-frequency observations, Markov chain Monte Carlo (MCMC) techniques have been developed that introduce imputed data points to reduce the time steps between data points. This concept of Bayesian data imputation for the inference of diffusions has been used and developed further by many authors such as [3–6]. These methods are applicable to multidimensional processes and were extended for the case of latent process components as well as for the occurrence of measurement error. Thus, they are very promising for the use in real-data applications (e.g. [2,7]).
The concept of these MCMC algorithms is to construct a Markov chain whose elements are samples from the joint posterior density of the parameter and the imputed data points conditioned on the observations. This construction is achieved via a Gibbs sampling approach by alternately executing the following two steps:
drawing the parameter conditional on the augmented path that consists of the observed data points and imputed data points and
drawing the imputed data points conditional on the current parameter and the observed data points.
The numerical approximation of the transition densities of the process is necessary not only for calculating the posterior densities, but also for proposing the imputed data points. In both contexts, the Euler–Maruyama scheme is the standard approximation technique in the literature, including all of the aforementioned references. To reduce the amount of imputed data and the number of necessary iterations for the computationally expensive estimation method, one possible solution is to employ higher-order approximation schemes.
Therefore, we investigate the utilization and usefulness of such higher-order approximations on the example of the Milstein scheme. A closed form of the transition density based on the Milstein scheme is derived in . In , this closed form is used to estimate the parameters of a hyperbolic diffusion process from high-frequency financial data, but not in the context of Bayesian data augmentation. For the latter, Elerian et al.  propose the possible use of the Milstein scheme. However, the specific implementation and benefit of this framework, in particular when using sophisticated proposal methods, remain unclear and, therefore, are the focus of this work. For our investigation, we first explain how to integrate the Milstein scheme into the framework of Bayesian data augmentation and then assess the effectiveness of this new combination in a simulation study which is a common approach in the literature (e.g. [10,11]).
This article is organized as follows. In §2, we define diffusion processes, describe the numerical approximation of their paths, and explain the derivation of the transition densities of the processes based on these approximations. In §3, we elaborate on the parameter estimation methods for diffusion processes using Bayesian data augmentation and the approximated transition densities. In §4, we give some comments about our implementation of these methods and in §5, we explain the set-up of our simulation study. In §§6 and 7, we present the results and discussion. The source code of our implementation and the simulation study is publicly available at https://github.com/fuchslab/Inference_for_SDEs_with_the_Milstein_scheme.
2. Approximation of the transition density of a diffusion process
We consider a d-dimensional time-homogeneous Itô diffusion process, (Xt)t≥0, a stochastic process that fulfils the following SDE:
In this work, we use rather simple, well-known examples of such a diffusion process in order to focus on the investigated estimation methods and make the article easy to follow. Our example for the main text is the geometric Brownian motion (GBM), and in appendix D, we also provide all relevant details for the Cox–Ingersoll–Ross (CIR) process. The GBM is described by the following SDE:
In different contexts, one often considers the logarithm of the GBM, log Xt, which is simply a normally distributed random variable for fixed t, with corresponding SDE
2.1. Approximation of the solution of an SDE
Unlike the GBM, most SDEs do not have an analytical solution; thus, their transition densities are not explicitly known. Instead, numerical approximation schemes are used for the solution of the SDEs. Kloeden & Platen  have provided a detailed description of these methods. Several of the approximation schemes are based on the stochastic Taylor expansion. For a general treatment of this expansion, we refer the interested reader to . The most commonly used approximation is the Euler(–Maruyama) scheme, which approximates the d-dimensional solution (Xt)t≥0 of an SDE by setting Y0 = x0 and, then, successively calculating the following:
A discrete-time approximation YΔ with maximum step size Δ > 0 converges with strong order γ > 0 at time T to the solution XT of a given SDE if there exists a positive constant C independent of Δ and a Δ0 > 0 such that
By adding another term of the stochastic Taylor expansion to equation (2.5), one obtains the Milstein scheme that approximates the d-dimensional process (Xt)t≥0 by setting Y0 = x0 and, then, successively calculating for the ith component
When σ(Yk, θ) is constant in Yk, the last term vanishes and the Milstein scheme reduces to the Euler scheme. If μ(Yk, θ) is once continuously differentiable and σ(Yk, θ) is twice continuously differentiable regarding Yk, then, the Milstein scheme is strongly convergent of order 1.0, which is higher than that of the Euler scheme. An illustration of this difference in the simulation of SDE trajectories is presented e.g. in . However, there is a severe restriction on the practical applicability of the Milstein scheme because the stochastic double integral in the last term of (2.6) only yields an analytical solution for j = l. Although approximation techniques for the double integral exist (e.g. ), they are unsuitable for our purposes. On the one hand, we wish to avoid adding yet another layer of approximation and, thus, additional computational time. On the other hand, we must find the distribution of Yk+1 based on approximation schemes (2.5) and (2.6), which is also not explicitly possible when adding another approximation. For this reason, we focus on models where the double integral appears exclusively for the same components of the Brownian motion. For example, this is the case when the process is driven by a one-dimensional Brownian motion (i.e. the diffusion function σ(Yk, θ) is of dimension d × 1). Hence, the diffusion model includes only one source of noise that may affect each of the components of the process. More generally, we require that
if an entry σrj(Yk, θ) is non-zero, then the entries of all other columns and all rows must not depend on , and
if an entry σil(Yk, θ) depends on , then the entries of all other columns in row r must be zero.
Assume that the ith component of the diffusion process appears in the ith row of the diffusion function and that the respective entry of the diffusion function does not depend on the remaining components , r ≠ i (the contrary would impose restrictions on other rows, as described above). Then, the ith component of the approximated process is
Moreover, note that if we consider the approximation in equation (2.8) as a function of the increment of the Brownian motion, g is quadratic in . Therefore, the function g has a global extremum with value
Since our example, the GBM, is a one-dimensional process, the double integral in equation (2.6) vanishes and the Milstein scheme for the GBM yields the following:
2.2. Transition densities based on approximation schemes
While sampling diffusion paths is fairly straightforward for both approximation schemes as described above, determining the corresponding transition density is less apparent for the Milstein scheme. Since the Euler scheme is a linear transformation of , where Iq denotes the m-dimensional identity matrix, the transition density derived from the Euler scheme is also a multivariate Gaussian density:
For the Milstein scheme, deriving the transition density is more complicated, even in the case of a one-dimensional diffusion process, which we consider here. Elerian  derived the transition density by first rearranging the Milstein scheme to obtain a transformation of a non-central chi-squared distributed variable for which the density is known, and then applying the random variable transformation theorem. In appendix A, we present an alternative derivation that directly applies the random variable transformation theorem to ΔBk. Both approaches produce the same result. For simplicity of notation, we set μk := μ(Yk, θ), σk : = σ(Yk, θ) and . Then, the transition density based on the Milstein approximation for a one-dimensional diffusion process is as follows:
For the GBM, we have σ(Xt, θ) = σXt with parameter σ > 0, the process taking values in . Therefore, we obtain a lower bound for the possible values of Yk+1
In figure 2, we illustrate the transition densities based on the GBM solution, Euler scheme and Milstein scheme for two different parameter settings. We observe that the Milstein transition density better approximates the mode of the transition density of the solution than the Euler transition density does. On the other hand, while the support of the Euler transition density is the set of all real numbers, the Milstein transition density puts zero weight on the values of Yk+1 that are below the lower bound, even though some of the values are feasible according to the transition density of the solution process.
Other approximation methods for the transition densities were developed for example in [16–18]. Here, we focus on the numerical approximation methods described above. Because for the estimation methods introduced in the next section, it is crucial to not only be able to approximate the transition density, but also sampling from the resulting density needs to be possible and fast.
3. Bayesian data augmentation for the parameter estimation of diffusions
With low-frequency observations of the process (Xt)t≥0 described by the SDE (2.1), we wish to estimate parameter θ. In this work, we assume that all observations are complete (i.e. there are no latent or unobserved components for all observations) and that there are no measurement errors. The approximation schemes for the solution of the SDE as introduced in §2 are only appropriate for small time steps. Therefore, we introduce additional data points Ximp at intermediate time points (as visualized in figure 3 and explained in detail in §3.2) and estimate the parameter θ from the augmented path . To this end, a two-step MCMC approach is used to construct the Markov chain , the elements of which are samples from the joint posterior distribution π(θ, Ximp | Xobs):
Step (1) Parameter update: Draw ,
Step (2) Path update: Draw .
A general introduction to MCMC methods is presented in . The resulting MCMC chain , after discarding the first l elements as burn-in, can be considered a sample drawn from the joint posterior distribution π(θ, Ximp | Xobs) and can be used for a fully Bayesian analysis. The two steps of the algorithm are described in detail in the following two subsections. We use π to denote the exact densities of the process that is the (full conditional) posterior densities as well as the transition densities. The meaning becomes clear from the arguments. Approximated densities are indicated by a corresponding superscript.
3.1. Parameter update
In Step (1), a parameter proposal θ* is drawn from a proposal density q(θ* | θ, Xobs, Ximp) which may or may not depend on the imputed and observed data. If a proposal θ* = θ + u with an update u that is independent of the current parameter value θ is used, the proposal strategy is called a random walk proposal. Proposal θ* is accepted with the following probability:
Due to Bayes’ theorem and the fact that a diffusion process has the Markov property, the (full conditional) posterior density can be represented as
The transition density is generally not explicitly known, but it can be approximated by the Euler or Milstein scheme as described in §2.
3.2. Path update
Since a diffusion process has the Markov property, the likelihood function of parameter θ factorizes as
After initializing the imputed data by linear interpolation, the path is updated using the Metropolis–Hastings algorithm. A proposal is drawn from a distribution with density q, which may depend on the observed data, current imputed data and parameter θ. It is accepted with the following probability:
The challenging aspect of the path update step involves determining how to propose new points. The simplest approach uses the (approximated) transition density to propose a new point by conditioning only on the point to the left of the new point. We call this proposal method the left-conditioned proposal and illustrate it in figure 4a. The proposal density of an entire path segment is simply the product
This proposal strategy considers the information from the observation on the left, while the proposed path segment is independent of the observation on the right. This may lead to a large jump in the last step from to , as can be seen in figure 4c,d, and hence, to an improbable transition. Therefore, the acceptance probability for the left-conditioned proposal , and consequently, the acceptance rate of the MCMC sampler is usually low.
A number of more sophisticated proposal strategies have been suggested. Ch. 7.1 in  reviews some of these. Here, we first consider the modified bridge (MB) proposal, which conditions on both the previous data point and the following observation on the right, as visualized in figure 4b. This strategy was originally proposed by Durham & Gallant  and first applied in the Bayesian framework in . More recently, Whitaker et al.  suggested improved bridge constructs, and van der Meulen & Schauer  proposed the so-called guided proposals.
For the MB proposal, the proposal density of an entire path segment factorizes again as follows:
In , it is suggested to approximate the two transition densities on the right-hand side by the Euler scheme and to further approximate and by and , respectively. This way, they obtain that (3.4) is approximately proportional to a Gaussian density which we will use for the MB proposal based on the Euler scheme
We now consider the Milstein approximation for the two factors on the right-hand side of (3.4). The first factor resembles the Milstein transition density stated in §2.2. With the same notation, Δ+ = tm − tk+1, and tm = τi+1, the second factor is as follows:
For the GBM, we have Xt > 0 and and thus, obtain the following bounds for , the second factor in (3.4):
Since the MB proposal takes into account information not only from the left data point but also from the observation on the right, it does not have a large jump in the last step as the left-conditioned proposal does. This is also apparent in the simulations in figure 4e,f. Therefore, the acceptance probability and acceptance rate are usually higher for the MB proposal than for the left-conditioned proposal. As appendix B demonstrates, the acceptance probability is even equal to 1 for the MB proposal if only one data point is imputed between two observations (i.e. the number of inter-observation intervals is m = 2). This holds when using the Milstein scheme to approximate the transition density for the likelihood function and proposal density, but also when using the Euler scheme without the approximation of μk+1 and σk+1 by μk and σk, respectively.
The density of the MB proposal based on the Euler scheme in equation (3.5) can also be interpreted as the density that results from applying the Euler scheme to the following diffusion process:
Thus far, our path update has only been applied to imputed points between two observations. It can easily be extended to a case with several observations along the path by simply decomposing the path into independent path proposals, multiplying the respective acceptance probabilities and collectively accepting or rejecting the proposals. Moreover, the entire path does not have to be updated all at once, but can be divided into several path segments that are successively updated. Different algorithms for choosing the update interval are summarized in  and appendix C describes one of them.
Another challenge in the context of Bayesian data augmentation and the MCMC scheme discussed above is the dependence between the parameter components included in the diffusion function and the missing path segments between two observations. Roberts & Stramer  were the first to highlight that, in the discretized setting (as we consider it here), the dependence leads to a slower convergence of the MCMC algorithm as the number of imputed points m increases. All estimation methods compared here are affected by this issue in the same way; we hence do not further consider it here.
We have introduced a number of possible options for the choices to be made when constructing an estimation method in the framework as described so far:
approximate the transition densities in the likelihood function based on the Euler or Milstein scheme,
use the left-conditioned, the MB or the DBM proposal, and
use the Euler or Milstein scheme for the proposal densities (for the left-conditioned or MB proposal).
(MBE-E) MB proposal and transition density both based on the Euler scheme,
(MBE-M) MB proposal based on the Euler scheme and transition density based on the Milstein scheme,
(MBM-M) MB proposal and transition density both based on the Milstein scheme, and
(DBM-M) DBM proposal (which is based on the Milstein scheme) and transition density based on the Milstein scheme.
To our knowledge, we are the first to use the Milstein scheme in the MCMC context described above.
The implementation is relatively straightforward for the majority of the estimation procedures, and only the combination of the MB proposal and the Milstein approximation requires additional explanation. As mentioned, when approximating the two factors on the right-hand side of (3.4) by the transition density based on the Milstein scheme, there is no closed formula for the normalization constant to obtain a proper density. The normalization is necessary because the proposal density for a path segment is the product of several of the terms from (3.4), where the condition on the left point, , differs between a newly proposed segment and the last accepted segment if several consecutive points are imputed. Therefore, the normalization constants differ and do not cancel out in the acceptance probability. Normalization is not necessary only in the case where just one point is imputed between two observations (i.e. m = 2 subintervals) because the left point, , is always a (fixed) observed point that is not updated. Thus, the normalization constants cancel out in the acceptance probability. For m > 2, we numerically integrate the product (3.4) over to obtain the normalization constant. The product in (3.4) may be very small (but not zero everywhere in a non-empty feasible set ) and may thus numerically integrate to zero, especially when the upper interval bound of the feasible set is infinite. To overcome this problem, we take two measures. First, we do not integrate over the entire set of feasible points but determine the maximum of the product numerically and then integrate over the interval that includes all points with a function value of at least 10−20 times this maximum. Second, we rescale the product in (3.4) by dividing by the maximum before integrating.
To sample from the Milstein MB proposal density, we employ rejection sampling. For this, normalization of the product in (3.4) is not necessary. Again, we numerically determine the maximum dmax of the product, and the interval that includes all points with a function value of at least 10−20 times this maximum. Then, we uniformly sample (u1, u2) from rectangle and accept u1 as a proposal if the unnormalized density value of (3.4) at u1 is at most u2.
For the combination of the MB proposal and the Milstein approximation, the set of feasible proposal points may be empty. In this case, our implementation shifts to the Euler approximation for this point, i.e. the point is proposed with the MB proposal based on the Euler scheme and also the corresponding factor of the proposal density in the acceptance probability is based on the Euler scheme. In addition, for all methods, a negative point may be proposed, which is not feasible for a GBM. Therefore, in this case, we propose a new point. For both cases, we count the number of times that they occur during the estimation procedure. In the following simulation study, no cases of switching to the Euler scheme occurred and negative proposals occurred only very rarely (less than 1‰ of the number of iterations in the very worst case).
We implemented the described estimation procedures in R v. 3.6.2 . The source code of our implementation and the following simulation study is publicly available at https://github.com/fuchslab/Inference_for_SDEs_with_the_Milstein_scheme.
5. Simulation study
In this section, we study the computational performance of the competing inference methods on the (relatively simple) benchmark model GBM. As a second (on the application side more often studied) benchmark model, the CIR process is investigated in appendix D. In this work, we focus on Bayesian inference by data augmentation and compare the four approaches listed at the end of §3.2. Conceptually different inference procedures, as summarized e.g. in , are not considered as competitors here as they would be employed in different data contexts. There are two aspects that are important to consider when we want to evaluate the different methods:
the accuracy with which the true posterior distribution is approximated based on one of the approximation schemes and a given number m and
the accuracy with which we are able to draw from this approximated posterior distribution.
For the simulation study, we generated 100 paths of the GBM in the time interval [0, 1] using the solution (2.3) with the parameter combination θ = (α, σ2)T = (1, 2)T and initial value x0 = 100. Figure 5 illustrates some of these paths. From each path, we took M = 20 equidistant points (i.e. the inter-observation time Δt was 0.05) and applied each of the four described estimation methods once. We imputed data such that we got m = 2 and m = 5 inter-observation intervals. We also included the case m = 1, i.e. no data were imputed and only Step (1) from §3, the parameter update, was repeated in the estimation procedure where the likelihood of the path in the acceptance probability is approximated by the Euler or the Milstein scheme. For the prior distribution of the parameters, we assumed that they were independently distributed with and σ2 ∼ IG(κ0 = 2, ν0 = 2), where IG denotes the inverse gamma distribution with shape parameter κ0 and scale parameter ν0. The a priori expectations of the parameters are thus and .
Each of the estimation procedures performs the following steps:
Draw initial values for the parameters α and σ2 from the prior distributions.
Initialize Yimp by linear interpolation.
Repeat the following steps:
Figures 6 and 7 present the output from one estimation procedure on the example of the combination MBM-M of the MB proposal and the Milstein approximation for the proposal density and the likelihood function. From each estimation procedure, we obtained an MCMC chain of dimension n(m − 1) + 2. For each chain, we used the two components for parameters α and σ2 and calculated the mean, the median, and the variance after cutting off a burn-in phase of 5000 iterations. To justify our use of independent proposals for the parameter update, we show in appendix E that the parameters are not strongly correlated.
As a benchmark, we also sampled from the true parameter posterior distribution based on the solution of the GBM. We used the Stan software [24,25] which provides an efficient C++ implementation of Hamiltonian Monte Carlo (HMC) sampling with the No-U-turn sampler to sample from the true parameter posterior distribution. For each posterior distribution corresponding to one of the 100 sample paths, we generated four HMC chains with 500 000 iterations each. The first half of the chains was discarded as warm-up and the remaining draws were combined to give a sample of size 106. We calculated the multivariate effective sample size (ESS) as defined in  which provides the size of an independent and identically distributed sample equivalent to our samples in terms of variance and found that the ESS of the obtained samples from the true posterior distribution is well over 500 000. For each of these samples, we also calculated the mean, the median and the variance.
The estimation procedures and time measurements were performed on a cluster of machines with the following specifications: AMD Opteron™ Processor 6376 (1.40 GHz), 512GB DDR3-RAM.
Figures 8 and 9 and tables 1 and 2 summarize the results of running each of the methods once for 1 h for each of the 100 GBM trajectories. Figures 8 and 9 show the density plots of the difference between the respective statistic (mean, median or variance) calculated for a sample from the approximated posterior distribution obtained by the respective method and the statistic for a sample from the true posterior distribution of the same sample path. Each density plot aggregates 100 such difference values, one for each of the 100 GBM trajectories. Table 1 tabulates the root mean square error (RMSE) based on these differences for each of the considered methods, discretization levels m and statistics. We use the RMSE as the measure of the overall accuracy. The lower the RMSE is, the higher the accuracy of the respective method. Table 2 empirically evaluates the computational efficiency of the considered methods, including the number of iterations completed after 1 h, the multivariate ESS based on the obtained sample after discarding a burn-in phase of 5000 iterations, and the acceptance rates of the parameter and the path proposals. Each of these quantities is averaged over the 100 GBM trajectories and the coefficient of variation is also stated.
|method||RMSEs for α||RMSEs for σ2|
|m = 1||Euler||0.282||0.244||0.456||0.638||0.600||0.471|
|m = 2||MBE-E||0.266||0.238||0.526||0.211||0.198||0.141|
|m = 5||MBE-E||0.277||0.254||0.524||0.113||0.098||0.127|
|method||number of iterations after 1 h||multivariate effective sample size||acceptance rate of the parameters||acceptance rate of the path|
|m = 1||Euler||25 134 301||0.03||1 273 744||0.16||0.518||0.02||—||—|
|Milstein||454 863||0.03||146 362||0.41||0.425||0.14||—||—|
|m = 2||MBE-E||8 583 614||0.03||170 827||0.19||0.442||0.01||0.842||0.04|
|MBE-M||1 816 144||0.03||24090||0.38||0.417||0.03||0.799||0.05|
|DBM-M||754 024||0.10||28 089||0.31||0.417||0.03||0.839||0.04|
|m = 5||MBE-E||6 765 054||0.10||49 885||0.18||0.310||0.01||0.892||0.02|
For the drift parameter α of the GBM, the four considered schemes perform comparably for m = 2 and m = 5. In particular, the use of the Milstein approximation does not improve the accuracy of the posterior mean and median for the same discretization level m. The accuracy of the posterior variance is slightly improved by the use of the Milstein approximation when data are imputed. Moreover, for MBE-E, the accuracy does not consistently improve as m is increased. Whereas, the accuracy for the methods including the Milstein scheme improves considerably when imputed data are introduced (i.e. m > 1) and it improves slightly when m is increased from 2 to 5.
For the diffusion parameter σ2 of the GBM, we clearly see an improvement in overall accuracy for the methods involving the Milstein scheme. Combination DBM-M turns out to be the most accurate, closely followed by MBE-M in case of the mean and median.
According to table 2, the number of iterations completed within 1 h varies substantially among the different estimation procedures. It is always higher for the procedures that use the Euler approximation, while especially combination MBM-M is very time-consuming and thus completes fewer iterations. Similarly, the multivariate ESS varies substantially among the different estimation procedures. It is higher for m = 2 than for m = 5 for each of the considered estimation procedures. The acceptance rate of the parameters is slightly lower when the Milstein scheme is used for the approximation of the likelihood function. In addition, the acceptance rate of the parameters decreases as the number of imputed points increases. The acceptance rate of the path is highest for combination MBM-M. For MBE-E, it would be just as high if one would not substitute μk+1 and σk+1 by μk and σk. For MBE-E, MBE-M and DBM-M, the acceptance rate of the path increases as the number of imputed points increases.
7. Summary and discussion
We have demonstrated how to implement an algorithm for the parameter estimation of SDEs from low-frequency data using the Milstein scheme to approximate the transition density of the underlying process. Our motivation was to improve numerical accuracy and thus reduce the amount of imputed data and computational overhead. However, our findings are rather discouraging: we found that this method can be applied to multidimensional processes only with impractical restrictions. Moreover, we showed that the combination of the MB proposal with the Milstein scheme for the proposal density may lead to an empty set of possible proposal points, which would require switching to the Euler scheme in order to proceed. One of the strengths of the original (Euler-based) MCMC scheme is its generic character and applicability. Through this, it possesses a practical advantage over otherwise more sophisticated methods such as the Exact Algorithm . This strength does not translate to the Milstein-based MCMC scheme due to the limited applicability of the Milstein approximation especially in the multidimensional setting. Thus, methods like the Exact Algorithm may be a reasonable alternative. The limited applicability of the Milstein approximation would also persist for advanced forms of the discussed MCMC scheme like the innovation scheme in  or for even more generic algorithms like particle MCMC as studied in .
In our simulation study, we found that the overall accuracy for the estimates for the drift parameter of the GBM does not necessarily improve when the Milstein scheme is used. Fewer iterations are completed for the methods involving the Milstein scheme and also the ESS is substantially lower. Thus, the poor sampling efficiency might outweigh the (potential) increase in accuracy of the approximation of the posterior distribution. Especially, the combination MBM-M results in a particularly low number of iterations and a low ESS. Owing to the already quite low ESS achieved by the Milstein-based methods for m = 5 subintervals between two observations, we did not consider higher discretization levels. Moreover, note that tuning the variance hyperparameters for the random walk proposals of the parameters in Step (3) in the simulation study to reach an optimal acceptance rate might lead to a higher ESS. However, since the acceptance rates achieved in the simulation study lie in a range where the sampling efficiency is rather robust to changes in the acceptance rate as shown in  (in the high-dimensional limit), we do not expect the change in the ESS after tuning to be substantial.
For the estimates for the GBM diffusion parameter, the overall accuracy is increased by the use of the Milstein scheme. DBM-M turns out to be the most effective combination in terms of overall accuracy.
We conducted another simulation study on the example of the CIR process, as shown in appendix D, and the results are very similar as for the GBM. The use of the Milstein approximation does not consistently improve the overall accuracy for the drift parameter; however, it does improve the accuracy for the diffusion parameter. Again combination DBM-M achieves the highest accuracy, closely followed by MBE-M.
It was expected that the use of the Milstein scheme would make a difference for the estimates for the diffusion parameters because the additional term added by the Milstein scheme compared to the Euler scheme involves the diffusion function and its derivative. Nevertheless, the general applicability of the Euler scheme remains a great advantage and the search for different proposal schemes such as in [10,22] rather than for different numerical discretization schemes may be a more promising way towards more efficient estimation algorithms for diffusion processes.
The source code of our implementation and the simulation study is publicly available at https://github.com/fuchslab/Inference_for_SDEs_with_the_Milstein_scheme.
C.F. devised the project and provided supervision. S.P. implemented the described algorithms, carried out the simulation study and drafted the manuscript. Both authors contributed to the final version of the manuscript, gave final approval for publication and agree to be held accountable for the work performed therein.
The authors declare that there are no conflicts of interest regarding the publication of this paper.
Our research was supported by the German Research Foundation within the SFB 1243, Subproject A17, by the Federal Ministry of Education and Research under grant no. 01DH17024, and by the Helmholtz pilot project ‘Uncertainty Quantification’.
The authors wish to thank three anonymous reviewers for very valuable suggestions that helped to significantly improve this article.
Appendix A. Derivation of the transition density based on the Milstein scheme
The Milstein scheme
Appendix B. Derivation of the acceptance probability for the MB proposal for m = 2 inter-observation intervals
As stated in §3.2, the acceptance probability for the path update between two consecutive observations and with the MB proposal is
Appendix C. Choice of path update interval
For choosing the update interval, we use the random block size algorithm as suggested in . Assuming that the augmented path contains a total of n + 1 data points Y0, …, Yn, it is divided into update segments by the following algorithm:
Set c0 = 0 and j = 1.
While cj−1 < n:
Such a random choice of the path update interval is a simple way to vary the set of points that are updated together within one iteration.
Appendix D. Additional example: Cox–Ingersoll–Ross process
The one-dimensional CIR process fulfils the SDE
For the CIR process, we have with parameter σ > 0, the process taking values in . We therefore obtain a lower bound for the possible values of when applying the Milstein scheme
For the simulation study, we generated 100 paths of the CIR process in the time interval [0, 1] with the parameter combination θ = (α, β, σ2)T = (1, 1, 2)T and initial value x0 = 10. From each path, we took 20 equidistant points and ran each of the described estimation methods once for 1 h to perform inference for the parameters β and σ2, assuming α to be known. For the prior distribution of the parameters, we assumed that they were independently distributed with β ∼ IG(κb = 3, νb = 3) and σ2 ∼ IG(κs = 3, νs = 4). The a priori expectations of the parameters are thus and . For each estimation procedure, the steps as outlined in §5 were taken. As proposal densities for the parameters in Step (3), we used and .
The sampling results are summarized in figures 10 and 11 and tables 3 and 4. Similar to the results for the GBM, the use of the Milstein approximation does not consistently improve the overall accuracy for the drift parameter β. The accuracy increases for increasing m for most of the methods. Only combination MBM-M has lower accuracy for m = 5 due to the low sampling efficiency and the resulting low ESS. For the diffusion parameter σ2, the use of the Milstein approximation and increasing m both improve the overall accuracy. Again combination DBM-M achieves the highest accuracy, closely followed by MBE-M.
|method||RMSEs for β||RMSEs for σ2|
|m = 1||Euler||0.0179||0.0115||0.0478||0.1603||0.1530||0.0673|
|m = 2||MBE-E||0.0099||0.0064||0.0265||0.0910||0.0865||0.0417|
|m = 5||MBE-E||0.0052||0.0036||0.0144||0.0400||0.0380||0.0194|
|method||number of iterations after 1 h||multivariate effective sample size||acceptance rate of the parameters||acceptance rate of the path|
|m = 1||Euler||23 461 023||0.11||2 422 521||0.14||0.443||0.03||—||—|
|Milstein||4 685 450||0.03||480 549||0.08||0.442||0.03||—||—|
|m = 2||MBE-E||8 482 241||0.06||422 034||0.10||0.384||0.03||0.964||0.01|
|MBE-M||1 944 229||0.05||94 071||0.10||0.383||0.03||0.957||0.01|
|DBM-M||1 905 354||0.04||95 262||0.10||0.383||0.03||0.968||0.01|
|m = 5||MBE-E||6 851 197||0.05||114 344||0.10||0.272||0.03||0.976||0.01|
|MBE-M||966 579||0.04||15 599||0.13||0.272||0.03||0.965||0.01|
|DBM-M||906 791||0.08||14 881||0.14||0.272||0.03||0.975||0.01|
Also for the CIR process, the number of iterations completed after 1 h and the multivariate ESS of the obtained sample vary substantially between the different procedures. Both quantities are highest for combination MBE-E, they are similar for MBE-M and DBM-M, and particularly low for MBM-M.
Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, provided the original author and source are credited.
Elerian O. 1998A note on the existence of a closed form conditional transition density for the Milstein scheme. Working paper. Nuffield College, University of Oxford, Oxford, UK. Google Scholar
Gilks W, Richardson S, Spiegelhalter D. 1996Markov chain Monte Carlo in practice. London, UK: Chapman & Hall. Google Scholar
Chib S, Shephard N. 2002[Numerical techniques for maximum likelihood estimation of continuous-time diffusion processes]: Comment. J. Bus. Econ. Stat. 20, 325-327. Google Scholar
Schmidt KD. 2009Maßund Wahrscheinlichkeit [Measure and Probability]. Berlin, Germany: Springer. Google Scholar
Gillespie DT. 1992Markov processes: an introduction for physical scientists. Boston, MA: Academic Press. Google Scholar