实现不可思议的对抗性例子的鲁棒性

论文标题

实现不可思议的对抗性例子的鲁棒性

Towards Robustness against Unsuspicious Adversarial Examples

论文作者

Tong, Liang, Guo, Minzhe, Prakash, Atul, Vorobeychik, Yevgeniy

论文摘要

尽管深层神经网络取得了显着的成功，但人们对它们对对抗性扰动的稳健性存在很大的关注。尽管大多数攻击旨在确保这些攻击是无法察觉的，但即使是可感知的，物理扰动攻击通常旨在不吉利。但是，没有普遍的想法，即对抗性例子是不疑问的。我们提出了一种通过利用认知显着性来建模可疑性的方法。具体而言，我们将图像分为前景（显着区域）和背景（其余），并在背景中允许明显更大的对抗扰动，同时确保背景的认知显着性仍然很低。我们描述了如何计算对分类器的双重扰动攻击产生的双重扰动攻击。然后，我们在实验上证明，我们的攻击确实不会显着改变背景的知觉显着性，但对分类器非常有效，对传统攻击的稳健性。此外，我们表明，具有双重扰动攻击的对抗性培训产生的分类器比最先进的鲁棒学习方法更适合这些训练，并且在鲁棒性与常规攻击方面相当。

Despite the remarkable success of deep neural networks, significant concerns have emerged about their robustness to adversarial perturbations to inputs. While most attacks aim to ensure that these are imperceptible, physical perturbation attacks typically aim for being unsuspicious, even if perceptible. However, there is no universal notion of what it means for adversarial examples to be unsuspicious. We propose an approach for modeling suspiciousness by leveraging cognitive salience. Specifically, we split an image into foreground (salient region) and background (the rest), and allow significantly larger adversarial perturbations in the background, while ensuring that cognitive salience of background remains low. We describe how to compute the resulting non-salience-preserving dual-perturbation attacks on classifiers. We then experimentally demonstrate that our attacks indeed do not significantly change perceptual salience of the background, but are highly effective against classifiers robust to conventional attacks. Furthermore, we show that adversarial training with dual-perturbation attacks yields classifiers that are more robust to these than state-of-the-art robust learning approaches, and comparable in terms of robustness to conventional attacks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题