可转移的功能提取器：学会捍卫预训练的网络针对白盒对手

论文标题

可转移的功能提取器：学会捍卫预训练的网络针对白盒对手

Robust Transferable Feature Extractors: Learning to Defend Pre-Trained Networks Against White Box Adversaries

论文作者

Cann, Alexander, Colbert, Ian, Amer, Ihab

论文摘要

在计算机视觉应用中广泛采用深度神经网络引起了对对抗性鲁棒性的重大兴趣。现有的研究表明，专门针对给定模型量身定制的恶意扰动输入（即，对抗性示例）可以成功地转移到另一个受过独立训练的模型中，以引起预测错误。此外，这种对抗性示例的属性归因于数据分布中预测模式得出的特征。因此，我们有动力调查以下问题：对抗性防御，例如对抗性例子，可以成功地转移到其他受过独立训练的模型中？为此，我们提出了一种基于深度学习的预处理机制，我们将其称为强大的可转移功能提取器（RTFE）。在研究了理论动机和含义后，我们在实验上表明，我们的方法可以为多个独立训练的分类器提供对抗性的鲁棒性，这些分类器原本是对自适应白盒对手的无效性。此外，我们表明RTFE甚至可以为在不同数据集中独立训练的模型提供一击的对抗鲁棒性。

The widespread adoption of deep neural networks in computer vision applications has brought forth a significant interest in adversarial robustness. Existing research has shown that maliciously perturbed inputs specifically tailored for a given model (i.e., adversarial examples) can be successfully transferred to another independently trained model to induce prediction errors. Moreover, this property of adversarial examples has been attributed to features derived from predictive patterns in the data distribution. Thus, we are motivated to investigate the following question: Can adversarial defenses, like adversarial examples, be successfully transferred to other independently trained models? To this end, we propose a deep learning-based pre-processing mechanism, which we refer to as a robust transferable feature extractor (RTFE). After examining theoretical motivation and implications, we experimentally show that our method can provide adversarial robustness to multiple independently pre-trained classifiers that are otherwise ineffective against an adaptive white box adversary. Furthermore, we show that RTFEs can even provide one-shot adversarial robustness to models independently trained on different datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题