论文标题
将政策摘要与奖励分解集成为解释强化学习者
Integrating Policy Summaries with Reward Decomposition for Explaining Reinforcement Learning Agents
论文作者
论文摘要
解释在连续决策设置中运行的强化学习剂的行为是一项挑战,因为它们的行为受到动态环境和延迟奖励的影响。帮助用户理解此类代理的行为的方法可以大致分为本地解释,这些解释分析了代理的特定决策和全球解释,这些解释传达了代理人的一般策略。在这项工作中,我们研究了强化学习者的本地和全球解释的新颖组合。具体来说,我们结合了奖励分解,这是一种局部解释方法,该方法揭示了奖励功能的哪些组成部分影响了特定的决策,并突出显示了一种全球解释方法,该方法显示了决定代理在决定性状态中的行为的摘要。我们进行了两项用户研究,以评估这些解释方法的整合及其各自的好处。我们的结果对这两种方法都显示出很大的好处。通常,我们发现本地奖励分解对于确定代理人的优先级更有用。但是,当代理人的偏好之间只有较小的区别时,突出显示的全球信息还提高了参与者的理解。
Explaining the behavior of reinforcement learning agents operating in sequential decision-making settings is challenging, as their behavior is affected by a dynamic environment and delayed rewards. Methods that help users understand the behavior of such agents can roughly be divided into local explanations that analyze specific decisions of the agents and global explanations that convey the general strategy of the agents. In this work, we study a novel combination of local and global explanations for reinforcement learning agents. Specifically, we combine reward decomposition, a local explanation method that exposes which components of the reward function influenced a specific decision, and HIGHLIGHTS, a global explanation method that shows a summary of the agent's behavior in decisive states. We conducted two user studies to evaluate the integration of these explanation methods and their respective benefits. Our results show significant benefits for both methods. In general, we found that the local reward decomposition was more useful for identifying the agents' priorities. However, when there was only a minor difference between the agents' preferences, then the global information provided by HIGHLIGHTS additionally improved participants' understanding.