贝尔曼操作员的合同性在风险避开无限范围

论文标题

贝尔曼操作员的合同性在风险避开无限范围

Contractivity of Bellman Operator in Risk Averse Dynamic Programming with Infinite Horizon

论文作者

Šmíd, Martin, Kopa, Miloš

论文摘要

本文处理了无限视野的风险避开动态编程问题。首先，规定所需的假设将问题定义得很好。然后得出了Bellman方程，这也可能被视为独立的增强学习问题。证明了Bellman操作员是收缩的事实，可以保证用于动态编程的各种解决方案算法以及增强学习问题的融合，我们在价值迭代算法上证明了这一事实。

The paper deals with a risk averse dynamic programming problem with infinite horizon. First, the required assumptions are formulated to have the problem well defined. Then the Bellman equation is derived, which may be also seen as a standalone reinforcement learning problem. The fact that the Bellman operator is contraction is proved, guaranteeing convergence of various solution algorithms used for dynamic programming as well as reinforcement learning problems, which we demonstrate on the value iteration algorithm.

下载PDF全文

下载文献需遵守相关版权规定

论文标题