论文标题
深层神经网络的抗扰动转换和分类系统
A Perturbation Resistant Transformation and Classification System for Deep Neural Networks
论文作者
论文摘要
深度卷积神经网络可以准确地对各种自然图像进行分类,但是在设计时可能很容易被欺骗,图像中嵌入了不可察觉的扰动。在本文中,我们设计了一种多管齐下的培训,输入转换和图像集成系统,该系统是攻击不可知论的,不容易估算的。我们的系统结合了两个新颖的功能。第一个是一个转换层,该层从类级训练数据示例中计算级别的多项式内核,并根据其特征内核差异在推理时间上更新输入图像副本,以创建转换后的输入集合。第二个是一个分类系统,该系统将未防御网络的预测结合在一起,并对被过滤图像的合奏进行了硬投票。我们在CIFAR10数据集上的评估显示,我们的系统提高了未防网络在不同距离指标下对各种有限和无限的白色盒子攻击的鲁棒性,同时牺牲了清洁图像的精度很小。反对自适应的全知攻击者创建端到端攻击,我们的系统成功地增强了对抗训练的网络的现有鲁棒性,为此,我们的方法最有效地应用了。
Deep convolutional neural networks accurately classify a diverse range of natural images, but may be easily deceived when designed, imperceptible perturbations are embedded in the images. In this paper, we design a multi-pronged training, input transformation, and image ensemble system that is attack agnostic and not easily estimated. Our system incorporates two novel features. The first is a transformation layer that computes feature level polynomial kernels from class-level training data samples and iteratively updates input image copies at inference time based on their feature kernel differences to create an ensemble of transformed inputs. The second is a classification system that incorporates the prediction of the undefended network with a hard vote on the ensemble of filtered images. Our evaluations on the CIFAR10 dataset show our system improves the robustness of an undefended network against a variety of bounded and unbounded white-box attacks under different distance metrics, while sacrificing little accuracy on clean images. Against adaptive full-knowledge attackers creating end-to-end attacks, our system successfully augments the existing robustness of adversarially trained networks, for which our methods are most effectively applied.