Sensitivity analysis of Wasserstein distributionally robust optimization problems
Abstract
We consider sensitivity of a generic stochastic optimization problem to model uncertainty. We take a non-parametric approach and capture model uncertainty using Wasserstein balls around the postulated model. We provide explicit formulae for the first-order correction to both the value function and the optimizer and further extend our results to optimization under linear constraints. We present applications to statistics, machine learning, mathematical finance and uncertainty quantification. In particular, we provide an explicit first-order approximation for square-root LASSO regression coefficients and deduce coefficient shrinkage compared to the ordinary least-squares regression. We consider robustness of call option pricing and deduce a new Black–Scholes sensitivity, a non-parametric version of the so-called Vega. We also compute sensitivities of optimized certainty equivalents in finance and propose measures to quantify robustness of neural networks to adversarial examples.
1. Introduction
We consider a generic stochastic optimization problem
A more systematic approach to model uncertainty in (1.1) is offered by the distributionally robust optimization problem
This paper is organized as follows. We first present the main results and then, in §3, explore their applications. Further discussion of our results and the related literature is found in §4, which is then followed by the proofs. The online appendix [9] contains many supplementary results and remarks, as well as some more technical arguments from the proofs.
2. Main results
Take , endow with the Euclidean norm and write for the interior of a set . Assume that is a closed convex subset of . Let denote the set of all (Borel) probability measures on . Further fix a seminorm on and denote by its (extended) dual norm, i.e. . In particular, for we also have . For , we define the -Wasserstein distance as
Naturally, other choices for the distance on the space of measures are also possible: such as the Kullblack–Leibler divergence, see [22] for general sensitivity results and [23] for applications in portfolio optimization, or the Hellinger distance, see [24] for a statistical robustness analysis. We refer to §4 for a more detailed analysis of the state of the art in these fields. Both of these approaches have good analytic properties and often lead to theoretically appealing closed-form solutions. However, they are also very restrictive since any measure in the neighbourhood of has to be absolutely continuous with respect to . In particular, if is the empirical measure of observations then measures in its neighbourhood have to be supported on those fixed points. To obtain meaningful results, it is thus necessary to impose additional structural assumptions, which are often hard to justify solely on the basis of the data at hand and, equally importantly, create another layer of model uncertainty themselves. We refer to [17, sec. 1.1] for further discussion of potential issues with -divergences. The Wasserstein distance, while harder to handle analytically, is more versatile and does not require any such additional assumptions.
Throughout the paper, we take the convention that continuity and closure are understood w.r.t. . We assume that is convex and closed and that the seminorm is strictly convex in the sense that for two elements with and , we have (note that this is satisfied for every -norm for ). We fix , let so that , and fix such that the boundary of has –zero measure and . Denote by the set of optimizers for in (1.2).
Assumption 2.1.
The loss function satisfies
— | is differentiable on for every . Moreover, is continuous and for every there is such that for all and with . | ||||
— | For all sufficiently small, we have and for every sequence such that and such that for all there is a subsequence which converges to some . |
The above assumption is not restrictive: the first part merely ensures existence of , while the second part is satisfied as soon as either is compact or is coercive, which is the case in most examples of interest; see [9, lemma 7.15] for further comments.
Theorem 2.2.
If assumption 2.1 holds then is given by
Remark.
Inspecting the proof, defining
The above result naturally extends to computing sensitivities of robust problems, i.e. , see [9, corollary 7.5], as well as to the case of stochastic optimization under linear constraints, see [9, theorem 7.7]. We recall that .
Assumption 2.3.
Suppose the is twice continuously differentiable, and
— | for some , , all and all close to . | ||||
— | The function is twice continuously differentiable in the neighbourhood of and the matrix is invertible. |
Theorem 2.4.
Suppose and such that as and assumptions 2.1 and 2.3 are satisfied. If -a.e. or if -a.e., then
Above and throughout the convention is that , , and . The assumed existence and convergence of optimizers holds, e.g. with suitable convexity of in ; see [9, lemma 7.14] for a worked out setting. In line with the financial economics practice, we gave our sensitivities letter symbols, and , loosely motivated by , the Greek for Model, and , the Hebrew for control.
3. Applications
We now illustrate the universality of theorems 2.2 and 2.4 by considering their applications in a number of different fields. Unless otherwise stated, , and means .
(a) Financial economics
We start with the simple example of risk-neutral pricing of a call option written on an underlying asset . Here, are the maturity and the strike, respectively, and is the distribution of . We set interest rates and dividends to zero for simplicity. In [25], the model is a lognormal distribution, i.e. is Gaussian with mean and variance . In this case, is given by the celebrated Black–Scholes formula. Note that this example is particularly simple since is independent of . However, to ensure risk-neutral pricing, we have to impose a linear constraint on the measures in , giving
We turn now to the classical notion of the optimized certainty equivalent (OCE) of [27]. It is a decision theoretic criterion designed to split a liability between today's and tomorrow’s payments. It is also a convex risk measure in the sense of [28] and covers many of the popular risk measures such as expected shortfall or entropic risk, see [29]. We fix a convex monotone function which is bounded from below and . Here, represents the payoff of a financial position and is the negative of a utility function, or a loss function. We take and refer to [9, lemma 7.14] for generic sufficient conditions for assumptions 2.1 and 2.3 to hold in this setup. The OCE corresponds to in (1.1) for and , . Theorems 2.2 and 2.4 yield the sensitivities
A related problem considers hedging strategies which minimize the expected loss of the hedged position, i.e. , where and represent today's and tomorrow’s traded prices. We compute as
Finally, we consider briefly the classical mean-variance optimization of [30]. Here represents the loss distribution across the assets and , are the relative investment weights. The original problem is to minimize the sum of the expectation and standard deviations of returns , with . Using the ideas in [31, Example 2] and considering measures on , we can recast the problem as (1.1). While [31] focused on the asymptotic regime , their non-asymptotic statements are related to our theorem 2.2 and either result could be used here to obtain that for small .
(b) Neural networks
We specialize now to quantifying robustness of neural networks (NN) to adversarial examples. This has been an important topic in machine learning since [32] observed that NN consistently misclassify inputs formed by applying small worst-case perturbations to a dataset. This produced a number of works offering either explanations for these effects or algorithms to create such adversarial examples, e.g. [33–39] to name just a few. The main focus of research works in this area, see [40], has been on faster algorithms for finding adversarial examples, typically leading to an overfit to these examples without any significant generalization properties. The viewpoint has been mainly pointwise, e.g. [32], with some generalizations to probabilistic robustness, e.g. [39].
In contrast, we propose a simple metric for measuring robustness of NN which is independent of the architecture employed and the algorithms for identifying adversarial examples. In fact, theorem 2.2 offers a simple and intuitive way to formalize robustness of NN: for simplicity consider a -layer neural network trained on a given distribution of pairs , i.e. solve
(c) Uncertainty quantification
In the context of UQ, the measure represents input parameters of a (possibly complicated) operation in a physical, engineering or economic system. We consider the so-called reliability or certification problem: for a given set of undesirable outcomes, one wants to control , for a set of probability measures . The distributionally robust adversarial classification problem considered recently by [42] is also of this form, with Wasserstein balls around an empirical measure of samples. Using the dual formulation of [18], they linked the problem to minimization of the conditional value-at-risk and proposed a reformulation, and numerical methods, in the case of linear classification. We propose instead a regularized version of the problem and look for
Assume that is convex. Then differentiable everywhere except at the boundary of with for and for all . Furthermore, assume is absolutely continuous w.r.t. Lebesgue measure on . Theorem 2.2, using [9, remark 7.3], gives a first-order expansion for the above problem:
(d) Statistics
We discuss two applications of our results in the realm of statistics. We start by highlighting the link between our results and the so-called influence curves (IC) in robust statistics. For a functional its IC is defined as
Our second application in statistics exploits the representation of the LASSO/Ridge regressions as robust versions of the standard linear regression. We consider and . If instead of the Euclidean metric we take , for some and , in the definition of the Wasserstein distance, then [19] showed that
The case of is naturally of particular importance in statistics and data science and we continue to consider it in the next subsection. In particular, we characterize the asymptotic distribution of , where and is the optimizer of the non-robust problem for the data-generating measure. This recovers the central limit theorem of [47], a link we explain further in §4b.
(e) Out-of-sample error
A benchmark of paramount importance in optimization is the so-called out-of-sample error, also known as the prediction error in statistical learning. Consider the setup above when is the empirical measure of i.i.d. observations sampled from the ‘true’ distribution and take, for simplicity, , with . Our aim is to compute the optimal which solves the original problem (1.1). However, we only have access to the training set, encoded via . Suppose we solve the distributionally robust optimization problem (1.2) for and denote the robust optimizer . Then the out-of-sample error
While this expression seems to be hard to compute explicitly for finite samples, theorem 2.4 offers a way to find the asymptotic distribution of a (suitably rescaled version of) the out-of-sample error. We suppose the assumptions in theorem 2.4 are satisfied and note that the first order condition for gives . Then, a second-order Taylor expansion gives
4. Further discussion and literature review
We start with an overview of related literature and then focus specifically on a comparison of our results with the CLT of [47] mentioned above.
(a) Discussion of related literature
Let us first remark, that while theorem 2.2 offers some superficial similarities to a classical maximum theorem, which is usually concerned with continuity properties of , in this work, we are instead interested in the exact first derivative of the function . Indeed, the convergence follows for all satisfying directly from the definition of convergence in Wasserstein metric (e.g. [49, Def. 6.8]). In conclusion, the main issue is to quantify the rate of this convergence by calculating the first derivative .
Our work investigates model uncertainty broadly conceived: it includes errors related to the choice of models from a particular (parametric or not) class of models as well as the mis-specification of such a class altogether (or indeed, its absence). In the decision theoretic literature, these aspects are sometimes referred to as model ambiguity and model mis-specification, respectively, see [50]. However, seeing our main problem (1.2) in decision theoretic terms is not necessarily helpful as we think of as given and not coming from some latent expected utility type of problem. In particular, our actions are just constants.
In our work, we decided to capture the uncertainty in the specification of using neighbourhoods in the Wasserstein distance. As already mentioned, other choices are possible and have been used in past. Possibly, the most often used alternative is the relative entropy, or the Kullblack–Leibler divergence. In particular, it has been used in this context in economics, see [51]. To the best of our knowledge, the only comparable study of sensitivities with respect to relative entropy balls is [22], see also [45] allowing for additional marginal constraints. However, this only considered the specific case where the reward function is independent of the action. Its main result is
To understand the relative technical difficulties and merits, it is insightful to go into the details of the statements. In fact, in the case of relative entropy and the one-period set-up we are considering, the exact form of the optimizing density can be determined exactly (see [22, Proposition 3.1]) up to a one-dimensional Langrange parameter. This is well known and is the reason behind the usual elegant formulae obtained in this context. But this then reduces the problem in [22] to a one-dimensional problem, which can be well-approximated via a Taylor approximation. By contrast, when we consider balls in the Wasserstein distance, the form of the optimizing measure is not known (apart from some degenerate cases). In fact, a key insight of our results is that the optimizing measure can be approximated by a deterministic shift in the direction (this is, in general, not exact but only true as a first-order approximation). The reason for these contrastive starting points of the analyses is the fact that Wasserstein balls contain a more heterogeneous set of measures, while in the case of relative entropy, exponentiating will always do the trick. We remark however that this is not true for the finite-horizon problems considered in [22, Section 3.2] any more, where the worst-case measure is found using an elaborate fixed-point equation.
A point which further emphasizes the fact that the topology introduced by the Wasserstein metric is less tractable is the fact that
The other well-studied distance is the Hellinger distance. [24] calculates influence curves for the minimum Hellinger distance estimator on a countable sample space. Their main result is that for the choice (where is a collection of parametric densities)
(b) Link to the central limit theorem of [47]
As observed in §3e above, theorem 2.4 allows to recover the main results in [47]. We explain this now in detail. Set , , . Let denote the empirical measure of i.i.d. samples from . We impose the assumptions on and from [47], including Lipschitz continuity of gradients of and strict convexity. These, in particular, imply that the optimizers and , as defined in §3e are well defined and unique, and further as . [47, Thm. 1] implies that, as ,
5. Proofs
We consider the case and here. For the general case and additional details, we refer to [9]. When clear from the context, we do not indicate the space over which we integrate.
Proof of theorem 2.2.
For every let denote those which satisfy
We start by showing the ‘’ inequality in the statement. For any one has with equality for . Therefore, differentiating and using both Fubini’s theorem and Hölder’s inequality, we obtain that
We turn now to the opposite ‘’ inequality. As for every there is no loss of generality in assuming that the right-hand side is not equal to zero. Now take any, for notational simplicity not relabelled, subsequence of which attains the liminf in and pick . By assumption, for a (again not relabelled) subsequence, one has . Further note that which implies
Proof of theorem 2.4.
We first show that
The proof of the ‘’ inequality in (5.2) follows by the very same arguments. Indeed, [9, lemma 8.5] implies that
By assumption, the matrix is invertible. Therefore, in a small neighbourhood of , the mapping is invertible. In particular, and by the first-order condition . Applying the chain rule and using (5.2) gives
Footnotes
Data accessibility
The codes used to generate figures in the paper are available on GitHub: http://github.com/JanObloj/Robust-uncertainty-sensitivity-analysis.
Authors' contributions
D.B., S.D., J.O. and J.W. formulated the mathematical problem, carried out the analysis, established the main results and drew conclusions. J.O. and J.W. wrote the first draft of the paper. D.B. and J.W. wrote the first draft of the appendix. S.D. and J.W. performed the numerical analysis. All the authors proof read and corrected the manuscript, gave final approval for publication and agree to be held accountable for the work performed therein.
Competing interests
The authors declare no competing interests.
Funding
This work was supported by the European Research Council [7th FP/ERC grant agreement no. 335421], the Vienna Science and Technology Fund (WWTF) [project MA16-021], the Austrian Science Fund (FWF) [project P28661] and the National Science Foundation of China (grant nos 11971310 and 11671257).
Acknowledgements
We thank Jose Blanchet, Mike Giles, Daniel Kuhn and Peyman Mohajerin Esfahani for their helpful comments on an earlier draft of this paper.
References
- 1.
Armacost RL, Fiacco AV . 1974 Computational experience in sensitivity analysis for nonlinear programming. Math. Program. 6, 301-326. (doi:10.1007/BF01580247) Crossref, Google Scholar - 2.
Vogel S . 2007 Stability results for stochastic programming problems. Optimization 19, 269-288. (doi:10.1080/02331938808843343) Crossref, Google Scholar - 3.
Bonnans JF, Shapiro A . 2013 Perturbation analysis of optimization problems. New York, NY: Springer. Google Scholar - 4.
Ghanem R, Higdon D, Owhadi H eds. 2017 Handbook of uncertainty quantification. Cham, Switzerland: Springer. Crossref, Google Scholar - 5.
Dupacova J . 1990 Stability and sensitivity analysis for stochastic programming. Ann. Oper. Res. 27, 115-142. (doi:10.1007/BF02055193) Crossref, Google Scholar - 6.
Romisch W . 2003 Stability of stochastic programming problems. In Stochastic programming, pp. 483–554. Amsterdam, The Netherlands: Elsevier. (doi:10.1016/S0927-0507(03)10008-4) Google Scholar - 7.
Asi H, Duchi JC . 2019 The importance of better models in stochastic optimization. Proc. Natl Acad. Sci. USA 116, 22 924-22 930. (doi:10.1073/pnas.1908018116) Crossref, Web of Science, Google Scholar - 8.
Rahimian H, Mehrotra S . 2019 Distributionally robust optimization: a review. (http://arxiv.org/abs/1908.05659) Google Scholar - 9.
Bartl D, Drapeau S, Obłój J, Wiesel J . 2021 Supplementary material from “Sensitivity analysis of Wasserstein distributionally robust optimization problems”. The Royal Society. Collection. (https://doi.org/10.6084/m9.figshare.c.5730987) Google Scholar - 10.
Chiappori PA, McCann RJ, Nesheim L . 2010 Hedonic price equilibria, stable matching, and optimal transport: equivalence, topology, and uniqueness. Econ. Theory 42, 317-354. (doi:10.1007/s00199-009-0455-z) Crossref, Web of Science, Google Scholar - 11.
Carlier G, Ekeland I . 2010 Matching for teams. Econ. Theory 42, 397-418. (doi:10.1007/s00199-008-0415-z) Crossref, Web of Science, Google Scholar - 12.
Peyré G, Cuturi M . 2019 Computational optimal transport. Found. Trends Mach. Learn. 11, 355-607. (doi:10.1561/2200000073) Crossref, Web of Science, Google Scholar - 13.
Pflug G, Wozabal D . 2007 Ambiguity in portfolio selection. Quant. Finance 7, 435-442. (doi:10.1080/14697680701455410) Crossref, Web of Science, Google Scholar - 14.
Fournier N, Guillin A . 2014 On the rate of convergence in Wasserstein distance of the empirical measure. Probab. Theory Relat. Fields 162, 707-738. (doi:10.1007/s00440-014-0583-7) Crossref, Web of Science, Google Scholar - 15.
Mohajerin Esfahani P, Kuhn D . 2018 Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations. Math. Program. 171, 115-166. (doi:10.1007/s10107-017-1172-1) Crossref, Web of Science, Google Scholar - 16.
Obłój J, Wiesel J . 2021 Robust estimation of superhedging prices. Ann. Stat. 49, 508-530. (doi:10.1214/20-AOS1966) Crossref, Web of Science, Google Scholar - 17.
Gao R, Kleywegt AJ . 2016 Distributionally robust stochastic optimization with Wasserstein distance. (http://arxiv.org/abs/1604.02199) Google Scholar - 18.
Blanchet J, Murthy K . 2019 Quantifying distributional model risk via optimal transport. Math. Oper. Res. 44, 565-600. (doi:10.1287/moor.2018.0936) Crossref, Web of Science, Google Scholar - 19.
Blanchet J, Kang Y, Murthy K . 2019 Robust Wasserstein profile inference and applications to machine learning. J. Appl. Probab. 56, 830-857. (doi:10.1017/jpr.2019.49) Crossref, Web of Science, Google Scholar - 20.
Kuhn D, Esfahani PM, Nguyen VA, Shafieezadeh-Abadeh S . 2019 Wasserstein distributionally robust optimization: theory and applications in machine learning. In Operations research & management science in the age of analytics, pp. 130–166. INFORMS. (doi:10.1287/educ.2019.0198) Google Scholar - 21.
Shafieezadeh-Abadeh S, Kuhn D, Esfahani PM . 2019 Regularization via mass transportation. J. Mach. Learn. Res. 20, 1-68. Web of Science, Google Scholar - 22.
Lam H . 2016 Robust sensitivity analysis for stochastic systems. Math. Oper. Res. 41, 1248-1275. (doi:10.1287/moor.2015.0776) Crossref, Web of Science, Google Scholar - 23.
Calafiore GC . 2007 Ambiguous risk measures and optimal robust portfolios. SIAM J. Optim. 18, 853-877. (doi:10.1137/060654803) Crossref, Web of Science, Google Scholar - 24.
Lindsay BG . 1994 Efficiency versus robustness: the case for minimum Hellinger distance and related methods. Ann. Stat. 22, 1081-1114. (doi:10.1214/aos/1176325512) Crossref, Web of Science, Google Scholar - 25.
Black F, Scholes M . 1973 The pricing of options and corporate liabilities. J. Political Econ. 81, 637-654. (doi:10.1086/260062) Crossref, Web of Science, Google Scholar - 26.
Bartl D, Drapeau S, Tangpi L . 2020 Computational aspects of robust optimized certainty equivalents and option pricing. Math. Finance 30, 287-309. (doi:10.1111/mafi.12203) Crossref, Web of Science, Google Scholar - 27.
Ben Tal A, Teboulle M . 1986 Expected utility, penalty functions, and duality in stochastic nonlinear programming. Manage. Sci. 32, 1445-1466. (doi:10.1287/mnsc.32.11.1445) Crossref, Web of Science, Google Scholar - 28.
Artzner P, Delbaen F, Eber J, Heath D . 1999 Coherent measures of risk. Math. Finance 9, 203-228. (doi:10.1111/1467-9965.00068) Crossref, Web of Science, Google Scholar - 29.
Ben Tal A, Teboulle M . 2007 An old-new concept of convex risk measures: the optimized certainty equivalent. Math. Finance 17, 449-476. (doi:10.1111/j.1467-9965.2007.00311.x) Crossref, Web of Science, Google Scholar - 30.
Markowitz H . 1952 Portfolio selection. J. Finance 7, 77-91. (doi:10.2307/2975974) Web of Science, Google Scholar - 31.
Pflug GC, Pichler A, Wozabal D . 2012 The 1/N investment strategy is optimal under high model ambiguity. J. Bank. Finance 36, 410-417. (doi:10.1016/j.jbankfin.2011.07.018) Crossref, Web of Science, Google Scholar - 32.
Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow I, Fergus R . 2013 Intriguing properties of neural networks. (http://arxiv.org/abs/1312.6199) Google Scholar - 33.
Goodfellow IJ, Shlens J, Szegedy C . 2014 Explaining and harnessing adversarial examples. (http://arxiv.org/abs/1412.6572) Google Scholar - 34.
Li L, Zhong Z, Li B, Xie T . 2019 Robustra: training provable robust neural networks over reference adversarial space. In Proc. 28th Int. Joint Conf. on Artificial Intelligence, pp. 4711–4717. AAAI Press. (doi:10.24963/ijcai.2019/654) Google Scholar - 35.
Carlini N, Wagner D . 2017 Towards evaluating the robustness of neural networks. In 2017 IEEE Symp. on Security and Privacy (SP), pp. 39–57. IEEE. (doi:10.1109/SP.2017.49) Google Scholar - 36.
Wong E, Kolter JZ . 2017 Provable defenses against adversarial examples via the convex outer adversarial polytope. (http://arxiv.org/abs/1711.00851) Google Scholar - 37.
Weng TW, Zhang H, Chen PY, Yi J, Su D, Gao Y, Hsieh CJ, Daniel L . 2018 Evaluating the robustness of neural networks: an extreme value theory approach. (http://arxiv.org/abs/1801.10578) Google Scholar - 38.
Araujo A, Pinot R, Negrevergne B, Meunier L, Chevaleyre Y, Yger F, Atif J . 2019 Robust neural networks using randomized adversarial training. (http://arxiv.org/abs/1903.10219) Google Scholar - 39.
Mangal R, Nori AV, Orso A . 2019 Robustness of neural networks: a probabilistic and practical approach. In Proc. 41st Int. Conf. on Software Engineering: New Ideas and Emerging Results, pp. 93–96. IEEE Press. (doi:10.1109/ICSE-NIER.2019.00032) Google Scholar - 40.
Bastani O, Ioannou Y, Lampropoulos L, Vytiniotis D, Nori A, Criminisi A . 2016 Measuring neural net robustness with constraints. (https://arxiv.org/abs/1605.07262) Google Scholar - 41.
Sinha A, Namkoong H, Volpi R, Duchi J . 2020 Certifying some distributional robustness with principled adversarial training. (http://arxiv.org/abs/1710.10571v5) Google Scholar - 42.
Ho-Nguyen N, Wright SJ . 2020 Adversarial classification via distributional robustness with wasserstein ambiguity. (http://arxiv.org/abs/2005.13815) Google Scholar - 43.
Chen Z, Kuhn D, Wiesemann W . 2018 Data-driven chance constrained programs over Wasserstein balls. (http://arxiv.org/abs/1809.00210) Google Scholar - 44.
Huber P, Ronchetti E . 1981 Robust statistics.Wiley Series in Probability and Mathematical Statistics , vol. 52. New York, NY: Wiley-IEEE. Crossref, Google Scholar - 45.
Lam H . 2018 Sensitivity to serial dependency of input processes: a robust approach. Manage. Sci. 64, 1311-1327. (doi:10.1287/mnsc.2016.2667) Crossref, Web of Science, Google Scholar - 46.
Tibshirani R . 1996 Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. B. Stat. Methodol. 58, 267-288. Crossref, Google Scholar - 47.
Blanchet J, Murthy K, Si N . 2019 Confidence regions in Wasserstein distributionally robust estimation. (http://arxiv.org/abs/1906.01614) Google Scholar - 48.
Anderson EJ, Philpott AB . 2019 Improving sample average approximation using distributional robustness. Optimization Online. See http://www.optimization-online.org/DB_HTML/2019/10/7405.html. Google Scholar - 49.
- 50.
Hansen LP, Marinacci M . 2016 Ambiguity aversion and model misspecification: an economic perspective. Stat. Sci. 31, 511-515. (doi:10.1214/16-STS570) Crossref, Web of Science, Google Scholar - 51.
Hansen LP, Sargent T . 2007 Robustness. Princeton, NJ: Princeton University Press. Crossref, Google Scholar - 52.
Atar R, Chowdhary K, Dupuis P . 2015 Robust bounds on risk-sensitive functionals via Rényi divergence. SIAM/ASA J. Uncertain. Quantif. 3, 18-33. (doi:10.1137/130939730) Crossref, Google Scholar - 53.
Glasserman P, Xu X . 2014 Robust risk measurement and model risk. Quant. Finance 14, 29-58. (doi:10.1080/14697688.2013.822989) Crossref, Web of Science, Google Scholar - 54.
Carlier G, Duval V, Peyré G, Schmitzer B . 2017 Convergence of entropic schemes for optimal transport and gradient flows. SIAM J. Math. Anal. 49, 1385-1418. (doi:10.1137/15M1050264) Crossref, Web of Science, Google Scholar - 55.
Peyré G, Cuturi M . 2019 Computational optimal transport: with applications to data science. Found. Trends Mach. Learn. 11, 355-607. (doi:10.1561/2200000073) Crossref, Web of Science, Google Scholar - 56.
Komorowski M, Costa MJ, Rand DA, Stumpf MP . 2011 Sensitivity, robustness, and identifiability in stochastic chemical kinetics models. Proc. Natl Acad. Sci. USA 108, 8645-8650. (doi:10.1073/pnas.1015814108) Crossref, PubMed, Web of Science, Google Scholar