论文标题

RASR:具有EVAR和熵风险的避免风险的软弹药MDP

RASR: Risk-Averse Soft-Robust MDPs with EVaR and Entropic Risk

论文作者

Hau, Jia Lin, Petrik, Marek, Ghavamzadeh, Mohammad, Russel, Reazul

论文摘要

先前关于安全加强学习的工作(RL)研究了对动态(aleatory)随机性的风险规避,并孤立地模拟了不确定性(认知)。我们提出并分析了一个新框架,以共同对有限马和折现的无限 - 马MDPS中的认知和差异不确定性相关的风险进行建模。我们称此框架结合了规避风险和软性的方法RASR。我们表明,当使用EVAR或熵风险定义风险规定时,可以使用具有时间相关风险水平的新的动态程序公式来有效地计算RASR中的最佳策略。结果,即使是在无限 - 马折扣的设置中,最佳的规避风险政策也是确定性但依赖时间的。我们还表明,具有平均后验过渡概率的特定RASR目标减少到规避风险的RL。我们的经验结果表明,我们的新算法始终减轻EVAR和其他标准风险措施衡量的不确定性。

Prior work on safe Reinforcement Learning (RL) has studied risk-aversion to randomness in dynamics (aleatory) and to model uncertainty (epistemic) in isolation. We propose and analyze a new framework to jointly model the risk associated with epistemic and aleatory uncertainties in finite-horizon and discounted infinite-horizon MDPs. We call this framework that combines Risk-Averse and Soft-Robust methods RASR. We show that when the risk-aversion is defined using either EVaR or the entropic risk, the optimal policy in RASR can be computed efficiently using a new dynamic program formulation with a time-dependent risk level. As a result, the optimal risk-averse policies are deterministic but time-dependent, even in the infinite-horizon discounted setting. We also show that particular RASR objectives reduce to risk-averse RL with mean posterior transition probabilities. Our empirical results show that our new algorithms consistently mitigate uncertainty as measured by EVaR and other standard risk measures.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源