论文标题
RASR:具有EVAR和熵风险的避免风险的软弹药MDP
RASR: Risk-Averse Soft-Robust MDPs with EVaR and Entropic Risk
论文作者
论文摘要
先前关于安全加强学习的工作(RL)研究了对动态(aleatory)随机性的风险规避,并孤立地模拟了不确定性(认知)。我们提出并分析了一个新框架,以共同对有限马和折现的无限 - 马MDPS中的认知和差异不确定性相关的风险进行建模。我们称此框架结合了规避风险和软性的方法RASR。我们表明,当使用EVAR或熵风险定义风险规定时,可以使用具有时间相关风险水平的新的动态程序公式来有效地计算RASR中的最佳策略。结果,即使是在无限 - 马折扣的设置中,最佳的规避风险政策也是确定性但依赖时间的。我们还表明,具有平均后验过渡概率的特定RASR目标减少到规避风险的RL。我们的经验结果表明,我们的新算法始终减轻EVAR和其他标准风险措施衡量的不确定性。
Prior work on safe Reinforcement Learning (RL) has studied risk-aversion to randomness in dynamics (aleatory) and to model uncertainty (epistemic) in isolation. We propose and analyze a new framework to jointly model the risk associated with epistemic and aleatory uncertainties in finite-horizon and discounted infinite-horizon MDPs. We call this framework that combines Risk-Averse and Soft-Robust methods RASR. We show that when the risk-aversion is defined using either EVaR or the entropic risk, the optimal policy in RASR can be computed efficiently using a new dynamic program formulation with a time-dependent risk level. As a result, the optimal risk-averse policies are deterministic but time-dependent, even in the infinite-horizon discounted setting. We also show that particular RASR objectives reduce to risk-averse RL with mean posterior transition probabilities. Our empirical results show that our new algorithms consistently mitigate uncertainty as measured by EVaR and other standard risk measures.