射手后门：带任意目标类的后门攻击

论文标题

射手后门：带任意目标类的后门攻击

Marksman Backdoor: Backdoor Attacks with Arbitrary Target Class

论文作者

Doan, Khoa D., Lao, Yingjie, Li, Ping

论文摘要

近年来，机器学习模型已被证明容易受到后门攻击的影响。在这样的攻击下，对手将隐身的后门嵌入到训练有素的模型中，以使折衷的模型通常在干净的输入上行为，但会根据对手对触发器的恶意构造输入的控制错误分类。尽管这些现有攻击非常有效，但对手的能力有限：鉴于输入，这些攻击只会导致模型将单个预定义或目标类别分类错误。相比之下，本文用更强大的有效载荷利用了新颖的后门攻击，称为射手，对手可以任意选择在推理过程中的任何输入的情况下，模型会错误分类。为了实现这一目标，我们建议将触发功能表示为类别的生成模型，并将后门注入受约束的优化框架中，其中触发函数学会学会生成最佳的触发模式，以同时将这种生成的后门嵌入到受过训练的模型中。鉴于学到的触发器生成功能，在推断期间，对手可以指定任意的后门攻击目标类，并相应地创建了导致模型对该目标类分类的适当触发器。我们从经验上表明，所提出的框架可以实现高攻击性能，同时保留了几个基准数据集中的清洁数据表现，包括MNIST，CIFAR10，GTSRB和Tinyimagenet。拟议的射手后门攻击还可以轻松绕过最初针对带有单个目标类的后门攻击的现有后门防御。我们的工作朝着了解后门攻击的广泛风险迈出了又一重要的一步。

In recent years, machine learning models have been shown to be vulnerable to backdoor attacks. Under such attacks, an adversary embeds a stealthy backdoor into the trained model such that the compromised models will behave normally on clean inputs but will misclassify according to the adversary's control on maliciously constructed input with a trigger. While these existing attacks are very effective, the adversary's capability is limited: given an input, these attacks can only cause the model to misclassify toward a single pre-defined or target class. In contrast, this paper exploits a novel backdoor attack with a much more powerful payload, denoted as Marksman, where the adversary can arbitrarily choose which target class the model will misclassify given any input during inference. To achieve this goal, we propose to represent the trigger function as a class-conditional generative model and to inject the backdoor in a constrained optimization framework, where the trigger function learns to generate an optimal trigger pattern to attack any target class at will while simultaneously embedding this generative backdoor into the trained model. Given the learned trigger-generation function, during inference, the adversary can specify an arbitrary backdoor attack target class, and an appropriate trigger causing the model to classify toward this target class is created accordingly. We show empirically that the proposed framework achieves high attack performance while preserving the clean-data performance in several benchmark datasets, including MNIST, CIFAR10, GTSRB, and TinyImageNet. The proposed Marksman backdoor attack can also easily bypass existing backdoor defenses that were originally designed against backdoor attacks with a single target class. Our work takes another significant step toward understanding the extensive risks of backdoor attacks in practice.

下载PDF全文

下载文献需遵守相关版权规定

论文标题