论文标题
敌人的敌人是我的朋友:探索逆向对抗训练的反对者
The Enemy of My Enemy is My Friend: Exploring Inverse Adversaries for Improving Adversarial Training
论文作者
论文摘要
尽管当前的深度学习技术在各种计算机视觉任务上取得了出色的表现,但它们仍然容易受到对抗性示例的影响。对抗性训练及其变体已被证明是防御对抗性例子的最有效方法。这些方法通常使对抗性的输出概率及其相应的自然示例之间的差异正常。但是,如果模型错误地分类一个自然的例子,则可能会产生负面影响。为了解决这个问题,我们提出了一种新颖的对抗训练方案,该方案鼓励该模型为对抗性示例及其``反向对手''产生相似的输出。这些样品的生成是为了最大程度地提高自然示例附近的可能性。在各种视觉数据集和体系结构上进行的广泛实验表明,我们的培训方法可实现最先进的鲁棒性以及自然精度。此外,使用逆向对手示例的通用版本,我们以低计算成本提高了单步对逆向训练技术的性能。
Although current deep learning techniques have yielded superior performance on various computer vision tasks, yet they are still vulnerable to adversarial examples. Adversarial training and its variants have been shown to be the most effective approaches to defend against adversarial examples. These methods usually regularize the difference between output probabilities for an adversarial and its corresponding natural example. However, it may have a negative impact if the model misclassifies a natural example. To circumvent this issue, we propose a novel adversarial training scheme that encourages the model to produce similar outputs for an adversarial example and its ``inverse adversarial'' counterpart. These samples are generated to maximize the likelihood in the neighborhood of natural examples. Extensive experiments on various vision datasets and architectures demonstrate that our training method achieves state-of-the-art robustness as well as natural accuracy. Furthermore, using a universal version of inverse adversarial examples, we improve the performance of single-step adversarial training techniques at a low computational cost.