FACM：中间层仍然保留针对对抗示例的有效特征

论文标题

FACM：中间层仍然保留针对对抗示例的有效特征

FACM: Intermediate Layer Still Retain Effective Features against Adversarial Examples

论文作者

Yang, Xiangyuan, Lin, Jie, Zhang, Hanlin, Yang, Xinyu, Zhao, Peng

论文摘要

在针对深神经网络（DNN）的强烈对抗性攻击中，生成的对抗示例将通过破坏最后一层的输出特征来误导DNN实施分类器。为了增强分类器的鲁棒性，在我们的论文中，建议提出建议使用中间层的特征来纠正分类的功能。具体而言，我们首先证明分类器的中间层仍然可以保留原始类别的有效功能，该类别定义为我们论文中的校正属性。 According to this, we propose the FACM model consisting of \textbf{F}eature \textbf{A}nalysis (FA) correction module, \textbf{C}onditional \textbf{M}atching \textbf{P}rediction \textbf{D}istribution (CMPD) correction module and decision module. FA校正模块是用中间层输出构建的完全连接的图层，作为纠正分类器分类的输入。 CMPD校正模块是条件自动编码器，它不仅可以使用中间层的输出作为加速收敛的条件，而且还可以减轻对抗性示例训练的负面影响，并使用Kullback-Leibler损失来匹配预测分布。通过经验验证的多样性属性，可以协同实现校正模块以减少对抗性子空间。因此，提出了决策模块来整合校正模块，以增强DNN分类器的鲁棒性。特别是，我们的模型可以通过微调来实现，并且可以与其他特定于模型的防御能力结合使用。

In strong adversarial attacks against deep neural networks (DNN), the generated adversarial example will mislead the DNN-implemented classifier by destroying the output features of the last layer. To enhance the robustness of the classifier, in our paper, a \textbf{F}eature \textbf{A}nalysis and \textbf{C}onditional \textbf{M}atching prediction distribution (FACM) model is proposed to utilize the features of intermediate layers to correct the classification. Specifically, we first prove that the intermediate layers of the classifier can still retain effective features for the original category, which is defined as the correction property in our paper. According to this, we propose the FACM model consisting of \textbf{F}eature \textbf{A}nalysis (FA) correction module, \textbf{C}onditional \textbf{M}atching \textbf{P}rediction \textbf{D}istribution (CMPD) correction module and decision module. The FA correction module is the fully connected layers constructed with the output of the intermediate layers as the input to correct the classification of the classifier. The CMPD correction module is a conditional auto-encoder, which can not only use the output of intermediate layers as the condition to accelerate convergence but also mitigate the negative effect of adversarial example training with the Kullback-Leibler loss to match prediction distribution. Through the empirically verified diversity property, the correction modules can be implemented synergistically to reduce the adversarial subspace. Hence, the decision module is proposed to integrate the correction modules to enhance the DNN classifier's robustness. Specially, our model can be achieved by fine-tuning and can be combined with other model-specific defenses.

下载PDF全文

下载文献需遵守相关版权规定

论文标题