论文标题
Recover:检测可解释的强化学习的因果混乱
ReCCoVER: Detecting Causal Confusion for Explainable Reinforcement Learning
论文作者
论文摘要
尽管近年来在各个领域取得了显着的结果,但深入的强化学习(DRL)算法缺乏透明度,影响用户信任并阻碍其部署到高风险任务。因果混乱是指代理商在特征之间学习虚假相关性的现象,这些特征可能无法在整个状态空间中占据,从而阻止了安全部署到可能破坏此类相关性的实际任务。在这项工作中,我们检查了一个代理是否依赖关键状态中的虚假相关性,并提出了一个替代性特征的子集,其应在其决定基础上基于其决策,以使其不易因因果混乱而受到影响。我们的目标是通过揭示博学的伪造相关性对其决策的影响,并向开发人员提供有关州空间不同部分的特征选择的建议,以避免因果混乱,从而提高DRL代理的透明度。我们提出了Recover,这是一种算法,在部署前在代理推理中检测因果混乱,通过在功能之间某些相关性不存在的替代环境中执行其策略。我们在出租车和网格世界环境中展示了我们的方法,其中Reccover检测到代理商依赖虚假相关性的状态,并提供了一套应考虑的特征。
Despite notable results in various fields over the recent years, deep reinforcement learning (DRL) algorithms lack transparency, affecting user trust and hindering their deployment to high-risk tasks. Causal confusion refers to a phenomenon where an agent learns spurious correlations between features which might not hold across the entire state space, preventing safe deployment to real tasks where such correlations might be broken. In this work, we examine whether an agent relies on spurious correlations in critical states, and propose an alternative subset of features on which it should base its decisions instead, to make it less susceptible to causal confusion. Our goal is to increase transparency of DRL agents by exposing the influence of learned spurious correlations on its decisions, and offering advice to developers about feature selection in different parts of state space, to avoid causal confusion. We propose ReCCoVER, an algorithm which detects causal confusion in agent's reasoning before deployment, by executing its policy in alternative environments where certain correlations between features do not hold. We demonstrate our approach in taxi and grid world environments, where ReCCoVER detects states in which an agent relies on spurious correlations and offers a set of features that should be considered instead.