论文标题
减轻对抗性攻击在非常深的网络中的影响
Mitigating the Impact of Adversarial Attacks in Very Deep Networks
论文作者
论文摘要
深神经网络(DNN)模型具有与安全问题有关的脆弱性,攻击者通常采用复杂的黑客技术来揭示其结构。支持数据中毒的扰动攻击是复杂的对抗性攻击,将虚假数据注入模型。它们会对学习过程产生负面影响,没有对更深层网络的好处,因为它们会降低模型的准确性和收敛速度。在本文中,我们提出了一种基于攻击的防御方法来减轻其影响。在其中,防御性特征层(DFL)与众所周知的DNN体系结构集成在一起,该体系结构有助于中和特征空间中非法扰动样品的影响。为了提高此方法的鲁棒性和可信度,以正确分类受攻击的输入样本,我们将训练有素的模型的隐藏空间正规化,该模型具有判别性损耗函数,称为偏光对比损失(PCL)。它改善了不同类别的样本之间的歧视,并保持了同一类中的样本的相似之处。另外,我们将DFL和PCL集成在紧凑的模型中,以防御数据中毒攻击。使用CIFAR-10和MNIST数据集对该方法进行了训练和测试,并具有支持数据中毒的扰动攻击,实验结果揭示了其与最近的同伴技术相比的出色性能。
Deep Neural Network (DNN) models have vulnerabilities related to security concerns, with attackers usually employing complex hacking techniques to expose their structures. Data poisoning-enabled perturbation attacks are complex adversarial ones that inject false data into models. They negatively impact the learning process, with no benefit to deeper networks, as they degrade a model's accuracy and convergence rates. In this paper, we propose an attack-agnostic-based defense method for mitigating their influence. In it, a Defensive Feature Layer (DFL) is integrated with a well-known DNN architecture which assists in neutralizing the effects of illegitimate perturbation samples in the feature space. To boost the robustness and trustworthiness of this method for correctly classifying attacked input samples, we regularize the hidden space of a trained model with a discriminative loss function called Polarized Contrastive Loss (PCL). It improves discrimination among samples in different classes and maintains the resemblance of those in the same class. Also, we integrate a DFL and PCL in a compact model for defending against data poisoning attacks. This method is trained and tested using the CIFAR-10 and MNIST datasets with data poisoning-enabled perturbation attacks, with the experimental results revealing its excellent performance compared with those of recent peer techniques.