通过对抗性损失来强大的深入加强学习

论文标题

通过对抗性损失来强大的深入加强学习

Robust Deep Reinforcement Learning through Adversarial Loss

论文作者

Oikarinen, Tuomas, Zhang, Wang, Megretski, Alexandre, Daniel, Luca, Weng, Tsui-Wei

论文摘要

最近的研究表明，深厚的强化学习者容易受到对代理商投入的小小的对抗性扰动的影响，这引起了人们对在现实世界中部署此类代理的担忧。为了解决这个问题，我们提出了Radial-RL，这是一个原则上的框架，以培训增强型学习者，以提高鲁棒性，以$ l_p $ norm-norm有限的对抗性攻击。我们的框架与流行的深度增强学习算法兼容，我们通过深度Q学习，A3C和PPO展示了其性能。我们在三种深度RL基准测试（Atari，Mujoco和Procgen）上进行了实验，以显示我们强大的训练算法的有效性。我们的径向RL代理在反对不同强度的攻击测试时，始终胜过先前的方法，并且在计算上更有效地训练。此外，我们提出了一种称为贪婪的最坏情况奖励（GWC）的新评估方法，以测量深RL剂的攻击不可抗拒性。我们表明，可以有效地评估GWC，并且是对对抗性攻击的最坏序列的良好估计。所有用于我们实验的代码均可在https://github.com/tuomaso/radial_rl_v2上获得。

Recent studies have shown that deep reinforcement learning agents are vulnerable to small adversarial perturbations on the agent's inputs, which raises concerns about deploying such agents in the real world. To address this issue, we propose RADIAL-RL, a principled framework to train reinforcement learning agents with improved robustness against $l_p$-norm bounded adversarial attacks. Our framework is compatible with popular deep reinforcement learning algorithms and we demonstrate its performance with deep Q-learning, A3C and PPO. We experiment on three deep RL benchmarks (Atari, MuJoCo and ProcGen) to show the effectiveness of our robust training algorithm. Our RADIAL-RL agents consistently outperform prior methods when tested against attacks of varying strength and are more computationally efficient to train. In addition, we propose a new evaluation method called Greedy Worst-Case Reward (GWC) to measure attack agnostic robustness of deep RL agents. We show that GWC can be evaluated efficiently and is a good estimate of the reward under the worst possible sequence of adversarial attacks. All code used for our experiments is available at https://github.com/tuomaso/radial_rl_v2.

下载PDF全文

下载文献需遵守相关版权规定

论文标题