攻击和捍卫深厚的强化学习政策

论文标题

攻击和捍卫深厚的强化学习政策

Attacking and Defending Deep Reinforcement Learning Policies

论文作者

Wang, Chao

论文摘要

最近的研究表明，深入的强化学习（DRL）政策容易受到对抗性攻击的影响，这引起了人们对DRL对安全至关重要系统的应用的担忧。在这项工作中，我们采取了一种原则性的方式，并从强大的优化的角度研究了DRL政策的鲁棒性来对抗攻击。在强大优化的框架内，通过最大程度地减少预期的政策回报来给出最佳的对抗攻击，并且应通过改善政策最差的案例绩效来实现良好的防御机制。考虑到攻击者通常无法访问培训环境，我们提出了一种贪婪的攻击算法，该算法试图在不与环境互动的情况下最大程度地减少政策的预期返回，并使用一种防御算法，以最大值的形式进行对抗性训练。 Atari游戏环境上的实验表明，我们的攻击算法比现有的攻击算法更有效，并且使政策的回报差，而我们的防御算法比现有的防御方法更强大地获得了一系列对抗性攻击（包括我们所提出的攻击算法）。

Recent studies have shown that deep reinforcement learning (DRL) policies are vulnerable to adversarial attacks, which raise concerns about applications of DRL to safety-critical systems. In this work, we adopt a principled way and study the robustness of DRL policies to adversarial attacks from the perspective of robust optimization. Within the framework of robust optimization, optimal adversarial attacks are given by minimizing the expected return of the policy, and correspondingly a good defense mechanism should be realized by improving the worst-case performance of the policy. Considering that attackers generally have no access to the training environment, we propose a greedy attack algorithm, which tries to minimize the expected return of the policy without interacting with the environment, and a defense algorithm, which performs adversarial training in a max-min form. Experiments on Atari game environments show that our attack algorithm is more effective and leads to worse return of the policy than existing attack algorithms, and our defense algorithm yields policies more robust than existing defense methods to a range of adversarial attacks (including our proposed attack algorithm).

下载PDF全文

下载文献需遵守相关版权规定

论文标题