对抗性例子的贝叶斯最佳视野

论文标题

对抗性例子的贝叶斯最佳视野

A Bayes-Optimal View on Adversarial Examples

论文作者

Richardson, Eitan, Weiss, Yair

论文摘要

自从发现对抗性示例以来 - 能够以微小的输入扰动现代CNN分类器的能力，因此进行了很多讨论，它们是否是当前神经架构和培训方法的“错误”，还是高度几何的不可避免的“特征”。在本文中，我们主张从贝叶斯最佳分类的角度检查对抗性例子。我们构建了现实的图像数据集，可以有效地计算出贝叶斯最佳分类器并在分布上得出分析条件，即使在高维度中，这些分类器也可以证明对任何对抗性攻击也可以强大。我们的结果表明，即使这些“黄金标准”最佳分类器是强大的，在同一数据集上训练的CNN始终学习脆弱的分类器，这表明对抗性示例通常是可避免的“错误”。我们进一步表明，经过相同数据训练的RBF SVM始终学习强大的分类器。在不同数据集中具有真实图像的实验中观察到相同的趋势。

Since the discovery of adversarial examples - the ability to fool modern CNN classifiers with tiny perturbations of the input, there has been much discussion whether they are a "bug" that is specific to current neural architectures and training methods or an inevitable "feature" of high dimensional geometry. In this paper, we argue for examining adversarial examples from the perspective of Bayes-Optimal classification. We construct realistic image datasets for which the Bayes-Optimal classifier can be efficiently computed and derive analytic conditions on the distributions under which these classifiers are provably robust against any adversarial attack even in high dimensions. Our results show that even when these "gold standard" optimal classifiers are robust, CNNs trained on the same datasets consistently learn a vulnerable classifier, indicating that adversarial examples are often an avoidable "bug". We further show that RBF SVMs trained on the same data consistently learn a robust classifier. The same trend is observed in experiments with real images in different datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题