论文标题
对手不可知论的强大的深度强化学习
Adversary Agnostic Robust Deep Reinforcement Learning
论文作者
论文摘要
深厚的强化学习(DRL)策略已被证明是通过扰动(例如随机噪声或强度对抗性攻击)对测试时间出现但在训练过程中未知的状态观察结果所欺骗的。为了提高DRL策略的鲁棒性,以前的方法假设可以将对手的知识添加到训练过程中,以在这些扰动的观察结果上实现相应的概括能力。但是,这样的假设不仅使鲁棒性的改善更加昂贵,而且可能会使模型对野外其他攻击的有效性较低。相比之下,我们提出了一个对手的强大DRL范式,不需要向对手学习。为此,我们首先是从理论上得出的,鲁棒性确实可以基于政策蒸馏设置独立于对手而实现。在这一发现的激励下,我们提出了一个新的政策蒸馏损失,其中有两个术语:1)处方差距最大化损失,旨在同时最大程度地提高教师政策和其余行动中的熵的行动的可能性; 2)相应的雅各布正规化损失,可最大程度地减少相对于输入态的梯度的幅度。理论分析表明,我们的蒸馏损失可以保证增加处方差距和对抗性鲁棒性。此外,与其他最先进的方法相比,在五个Atari游戏上进行的实验可以牢固地验证我们方法的优越性。
Deep reinforcement learning (DRL) policies have been shown to be deceived by perturbations (e.g., random noise or intensional adversarial attacks) on state observations that appear at test time but are unknown during training. To increase the robustness of DRL policies, previous approaches assume that the knowledge of adversaries can be added into the training process to achieve the corresponding generalization ability on these perturbed observations. However, such an assumption not only makes the robustness improvement more expensive but may also leave a model less effective to other kinds of attacks in the wild. In contrast, we propose an adversary agnostic robust DRL paradigm that does not require learning from adversaries. To this end, we first theoretically derive that robustness could indeed be achieved independently of the adversaries based on a policy distillation setting. Motivated by this finding, we propose a new policy distillation loss with two terms: 1) a prescription gap maximization loss aiming at simultaneously maximizing the likelihood of the action selected by the teacher policy and the entropy over the remaining actions; 2) a corresponding Jacobian regularization loss that minimizes the magnitude of the gradient with respect to the input state. The theoretical analysis shows that our distillation loss guarantees to increase the prescription gap and the adversarial robustness. Furthermore, experiments on five Atari games firmly verify the superiority of our approach in terms of boosting adversarial robustness compared to other state-of-the-art methods.