论文标题
遗憾的是对风险敏感的增强学习的界限
Regret Bounds for Risk-Sensitive Reinforcement Learning
论文作者
论文摘要
在医疗保健和机器人技术等强化学习的关键应用应用中,通常需要优化对风险敏感的目标,以说明尾声的影响,而不是预期的奖励。我们证明了在包括流行的CVAR目标在内的一般风险敏感目标下的强化学习的第一个遗憾界限。我们的理论基于CVAR目标的新颖表征以及新型的乐观MDP结构。
In safety-critical applications of reinforcement learning such as healthcare and robotics, it is often desirable to optimize risk-sensitive objectives that account for tail outcomes rather than expected reward. We prove the first regret bounds for reinforcement learning under a general class of risk-sensitive objectives including the popular CVaR objective. Our theory is based on a novel characterization of the CVaR objective as well as a novel optimistic MDP construction.