论文标题
增加边缘对抗(IMA)训练以改善神经网络的对抗性鲁棒性
Increasing-Margin Adversarial (IMA) Training to Improve Adversarial Robustness of Neural Networks
论文作者
论文摘要
深度神经网络(DNN)容易受到对抗性的噪音。对抗训练是针对对抗噪声提高DNN鲁棒性(即噪声数据的准确性)的一般有效策略。但是,与通过标准方法在清洁数据上训练的相同模型相比,受当前现有的对抗训练方法训练的DNN模型可能具有较低的标准准确性(即清洁数据的准确性),而这种现象被称为准确性和鲁棒性之间的权衡,并且被认为是不可避免的。此问题阻止了对抗性培训在许多应用领域(例如医学图像分析)中使用,因为从业人员不想过多地牺牲标准准确性来换取对抗性的鲁棒性。我们的目标是提高(即减轻甚至避免)标准准确性与对抗性鲁棒性之间的这种权衡,以进行医学图像分类和细分。我们提出了一种新颖的对抗训练方法,称为增加碎片对抗(IMA)训练,该方法得到了有关对抗性训练样本的最佳性的平衡状态分析的支持。我们的方法旨在通过生成最佳的对抗训练样本来保持准确性,同时提高鲁棒性。我们在六个公开可用的图像数据集上评估了我们的方法和其他八种代表性方法,这些数据集由AutoAttack和White-Noise Attack产生的噪声损坏。我们的方法实现了图像分类和分割的最高对抗性鲁棒性,清洁数据的准确性最小。对于其中一种应用,我们的方法提高了准确性和鲁棒性。我们的研究表明,我们的方法可以提高图像分类和分割应用的标准准确性和对抗性鲁棒性之间的权衡。
Deep neural networks (DNNs) are vulnerable to adversarial noises. Adversarial training is a general and effective strategy to improve DNN robustness (i.e., accuracy on noisy data) against adversarial noises. However, DNN models trained by the current existing adversarial training methods may have much lower standard accuracy (i.e., accuracy on clean data), compared to the same models trained by the standard method on clean data, and this phenomenon is known as the trade-off between accuracy and robustness and is considered unavoidable. This issue prevents adversarial training from being used in many application domains, such as medical image analysis, as practitioners do not want to sacrifice standard accuracy too much in exchange for adversarial robustness. Our objective is to lift (i.e., alleviate or even avoid) this trade-off between standard accuracy and adversarial robustness for medical image classification and segmentation. We propose a novel adversarial training method, named Increasing-Margin Adversarial (IMA) Training, which is supported by an equilibrium state analysis about the optimality of adversarial training samples. Our method aims to preserve accuracy while improving robustness by generating optimal adversarial training samples. We evaluate our method and the other eight representative methods on six publicly available image datasets corrupted by noises generated by AutoAttack and white-noise attack. Our method achieves the highest adversarial robustness for image classification and segmentation with the smallest reduction in accuracy on clean data. For one of the applications, our method improves both accuracy and robustness. Our study has demonstrated that our method can lift the trade-off between standard accuracy and adversarial robustness for the image classification and segmentation applications.