论文标题
通过功能可分离性增强对抗性训练
Enhancing Adversarial Training with Feature Separability
论文作者
论文摘要
深神经网络(DNN)容易受到对抗攻击的影响。作为对策,对抗性训练旨在基于Min-Max优化问题实现鲁棒性,并且已证明是最有效的防御策略之一。但是,在这项工作中,我们发现与自然训练相比,对抗训练无法学习清洁或对抗性样本的更好功能表示,这可能是对抗训练倾向于存在严重过度拟合的问题并且不太满意的性能的原因之一。具体而言,我们观察到现有的对抗训练方法学到的特征的两个主要缺点:(1)较低的阶层内特征相似性; (2)保守的层间特征方差。为了克服这些缺点,我们引入了一个新的对抗训练图(ATG)的概念,该概念具有特征性可分离性(ATFS)的拟议对抗训练,使得能够一致地增强了阶级内特征相似性并增加了类间特征方差。通过全面的实验,我们证明了所提出的ATFS框架可显着提高清洁和稳健的性能。
Deep Neural Network (DNN) are vulnerable to adversarial attacks. As a countermeasure, adversarial training aims to achieve robustness based on the min-max optimization problem and it has shown to be one of the most effective defense strategies. However, in this work, we found that compared with natural training, adversarial training fails to learn better feature representations for either clean or adversarial samples, which can be one reason why adversarial training tends to have severe overfitting issues and less satisfied generalize performance. Specifically, we observe two major shortcomings of the features learned by existing adversarial training methods:(1) low intra-class feature similarity; and (2) conservative inter-classes feature variance. To overcome these shortcomings, we introduce a new concept of adversarial training graph (ATG) with which the proposed adversarial training with feature separability (ATFS) enables to coherently boost the intra-class feature similarity and increase inter-class feature variance. Through comprehensive experiments, we demonstrate that the proposed ATFS framework significantly improves both clean and robust performance.