通过分销镜头进行风险的马尔可夫决策过程

论文标题

通过分销镜头进行风险的马尔可夫决策过程

Risk-Averse Markov Decision Processes through a Distributional Lens

论文作者

Cheng, Ziteng, Jaimungal, Sebastian

论文摘要

通过对法律不变凸风险度量的分配观点，我们在分配层面构建动态风险度量（DRM）。然后，我们将这些DRMS应用于马尔可夫决策过程，并结合潜在的成本，随机行动和弱连续的过渡内核。此外，提出的DRM允许风险规避动态变化。在温和的假设下，我们得出了动态的编程原理，并在有限和无限的时间范围内显示了最佳策略的存在。此外，我们为确定性行动的最佳性提供了足够的条件。为了插图，我们以最佳清算的示例和限制顺序书籍和自动驾驶的示例结束了本文。

By adopting a distributional viewpoint on law-invariant convex risk measures, we construct dynamics risk measures (DRMs) at the distributional level. We then apply these DRMs to investigate Markov decision processes, incorporating latent costs, random actions, and weakly continuous transition kernels. Furthermore, the proposed DRMs allow risk aversion to change dynamically. Under mild assumptions, we derive a dynamic programming principle and show the existence of an optimal policy in both finite and infinite time horizons. Moreover, we provide a sufficient condition for the optimality of deterministic actions. For illustration, we conclude the paper with examples from optimal liquidation with limit order books and autonomous driving.

下载PDF全文

下载文献需遵守相关版权规定

论文标题