遗憾的是对风险敏感的增强学习的界限

论文标题

遗憾的是对风险敏感的增强学习的界限

Regret Bounds for Risk-Sensitive Reinforcement Learning

论文作者

Bastani, O., Ma, Y. J., Shen, E., Xu, W.

论文摘要

在医疗保健和机器人技术等强化学习的关键应用应用中，通常需要优化对风险敏感的目标，以说明尾声的影响，而不是预期的奖励。我们证明了在包括流行的CVAR目标在内的一般风险敏感目标下的强化学习的第一个遗憾界限。我们的理论基于CVAR目标的新颖表征以及新型的乐观MDP结构。

In safety-critical applications of reinforcement learning such as healthcare and robotics, it is often desirable to optimize risk-sensitive objectives that account for tail outcomes rather than expected reward. We prove the first regret bounds for reinforcement learning under a general class of risk-sensitive objectives including the popular CVaR objective. Our theory is based on a novel characterization of the CVaR objective as well as a novel optimistic MDP construction.

下载PDF全文

下载文献需遵守相关版权规定

论文标题