论文标题
在特征层次结构上扰动,以提高标准和严格的黑框传递性
Perturbing Across the Feature Hierarchy to Improve Standard and Strict Blackbox Attack Transferability
论文作者
论文摘要
我们考虑深神经网络(DNN)图像分类器领域中基于黑框转移的目标对抗攻击威胁模型。我们的方法不关注源模型的输出层的跨越决策边界,而是整个提取的特征层次结构中的方法与其他类别相似。我们设计了一个灵活的攻击框架,该框架允许多层扰动,并展示了Imagenet DNN之间的最新目标转移性能。我们还在普遍的假设放松的情况下显示了我们特征空间方法的优越性,即在某些情况下,在同一数据集和标签空间上对源和目标模型进行了训练,在某些情况下,相对于其他黑盒转移方法,目标成功率提高了$ 10 \ times $。最后,我们分析了为什么所提出的方法在允许对BlackBox模型的有限查询的情况下显示该方法的现有攻击策略。
We consider the blackbox transfer-based targeted adversarial attack threat model in the realm of deep neural network (DNN) image classifiers. Rather than focusing on crossing decision boundaries at the output layer of the source model, our method perturbs representations throughout the extracted feature hierarchy to resemble other classes. We design a flexible attack framework that allows for multi-layer perturbations and demonstrates state-of-the-art targeted transfer performance between ImageNet DNNs. We also show the superiority of our feature space methods under a relaxation of the common assumption that the source and target models are trained on the same dataset and label space, in some instances achieving a $10\times$ increase in targeted success rate relative to other blackbox transfer methods. Finally, we analyze why the proposed methods outperform existing attack strategies and show an extension of the method in the case when limited queries to the blackbox model are allowed.