论文标题
advjnd:生成具有明显差异的对抗性示例
AdvJND: Generating Adversarial Examples with Just Noticeable Difference
论文作者
论文摘要
与传统的机器学习模型相比,深度神经网络的表现更好,尤其是在图像分类任务中。但是,它们容易受到对抗性例子的影响。在示例中添加小小的扰动会导致良好的模型错误地分类精心设计的例子,而人眼中没有类别差异,并成功地愚弄了深层模型。生成对抗性示例有两个要求:攻击成功率和图像保真度指标。通常,增加扰动以确保对抗性实例的高发作成功率;但是,所获得的对抗性例子的隐蔽性很差。为了减轻攻击成功率和图像保真度之间的权衡,我们提出了一种名为advjnd的方法,在产生对抗性示例时,在失真函数的约束中添加了视觉模型系数,只是显着的差异系数。实际上,人眼的视觉主观感觉被添加为先验信息,该信息决定扰动的分布,以提高对抗性示例的图像质量。我们测试了时尚界,CIFAR10和Miniimagenet数据集的方法。我们的ADVJND算法产生的梯度分布产生的对抗示例与原始输入的分布相似。因此,制作的噪声可以隐藏在原始输入中,从而显着改善了攻击的隐藏。
Compared with traditional machine learning models, deep neural networks perform better, especially in image classification tasks. However, they are vulnerable to adversarial examples. Adding small perturbations on examples causes a good-performance model to misclassify the crafted examples, without category differences in the human eyes, and fools deep models successfully. There are two requirements for generating adversarial examples: the attack success rate and image fidelity metrics. Generally, perturbations are increased to ensure the adversarial examples' high attack success rate; however, the adversarial examples obtained have poor concealment. To alleviate the tradeoff between the attack success rate and image fidelity, we propose a method named AdvJND, adding visual model coefficients, just noticeable difference coefficients, in the constraint of a distortion function when generating adversarial examples. In fact, the visual subjective feeling of the human eyes is added as a priori information, which decides the distribution of perturbations, to improve the image quality of adversarial examples. We tested our method on the FashionMNIST, CIFAR10, and MiniImageNet datasets. Adversarial examples generated by our AdvJND algorithm yield gradient distributions that are similar to those of the original inputs. Hence, the crafted noise can be hidden in the original inputs, thus improving the attack concealment significantly.