RASR：具有EVAR和熵风险的避免风险的软弹药MDP

论文标题

RASR：具有EVAR和熵风险的避免风险的软弹药MDP

RASR: Risk-Averse Soft-Robust MDPs with EVaR and Entropic Risk

论文作者

Hau, Jia Lin, Petrik, Marek, Ghavamzadeh, Mohammad, Russel, Reazul

论文摘要

先前关于安全加强学习的工作（RL）研究了对动态（aleatory）随机性的风险规避，并孤立地模拟了不确定性（认知）。我们提出并分析了一个新框架，以共同对有限马和折现的无限 - 马MDPS中的认知和差异不确定性相关的风险进行建模。我们称此框架结合了规避风险和软性的方法RASR。我们表明，当使用EVAR或熵风险定义风险规定时，可以使用具有时间相关风险水平的新的动态程序公式来有效地计算RASR中的最佳策略。结果，即使是在无限 - 马折扣的设置中，最佳的规避风险政策也是确定性但依赖时间的。我们还表明，具有平均后验过渡概率的特定RASR目标减少到规避风险的RL。我们的经验结果表明，我们的新算法始终减轻EVAR和其他标准风险措施衡量的不确定性。

Prior work on safe Reinforcement Learning (RL) has studied risk-aversion to randomness in dynamics (aleatory) and to model uncertainty (epistemic) in isolation. We propose and analyze a new framework to jointly model the risk associated with epistemic and aleatory uncertainties in finite-horizon and discounted infinite-horizon MDPs. We call this framework that combines Risk-Averse and Soft-Robust methods RASR. We show that when the risk-aversion is defined using either EVaR or the entropic risk, the optimal policy in RASR can be computed efficiently using a new dynamic program formulation with a time-dependent risk level. As a result, the optimal risk-averse policies are deterministic but time-dependent, even in the infinite-horizon discounted setting. We also show that particular RASR objectives reduce to risk-averse RL with mean posterior transition probabilities. Our empirical results show that our new algorithms consistently mitigate uncertainty as measured by EVaR and other standard risk measures.

下载PDF全文

下载文献需遵守相关版权规定

论文标题