论文标题

M到N后门范式:对深度学习模型的多触发和多目标攻击

M-to-N Backdoor Paradigm: A Multi-Trigger and Multi-Target Attack to Deep Learning Models

论文作者

Hou, Linshan, Hua, Zhongyun, Li, Yuhong, Zheng, Yifeng, Zhang, Leo Yu

论文摘要

深神经网络(DNNS)容易受到后门攻击的影响,在该攻击中,后门模型通常具有干净的输入,但在包含触发器的输入上展示了攻击者指定的行为。大多数以前的后门攻击主要集中在全能或全能范式上,使攻击者可以操纵输入以攻击单个目标类。此外,这两个范式依靠一个触发器来进行后门激活,如果触发器被破坏,则攻击无效。鉴于上述情况,我们提出了一个新的$ m $ -to-to-n $攻击范式,允许攻击者操纵任何输入以攻击$ n $目标类,并且$ n $目标类的每个后门都可以被其$ M $ $ $ $触发器中的任何一个激活。我们的攻击选择$ m $清洁的图像从触发器触发时,并利用我们提出的中毒图像生成框架将触发器注入干净的图像中。通过使用与清洁训练图像相同分布的触发器,目标DNN模型可以在训练过程中推广到触发器,从而提高我们对多个目标类别的攻击的有效性。广泛的实验结果表明,我们的新后门攻击在攻击多个目标类别以及对预处理操作和现有防御方面非常有效。

Deep neural networks (DNNs) are vulnerable to backdoor attacks, where a backdoored model behaves normally with clean inputs but exhibits attacker-specified behaviors upon the inputs containing triggers. Most previous backdoor attacks mainly focus on either the all-to-one or all-to-all paradigm, allowing attackers to manipulate an input to attack a single target class. Besides, the two paradigms rely on a single trigger for backdoor activation, rendering attacks ineffective if the trigger is destroyed. In light of the above, we propose a new $M$-to-$N$ attack paradigm that allows an attacker to manipulate any input to attack $N$ target classes, and each backdoor of the $N$ target classes can be activated by any one of its $M$ triggers. Our attack selects $M$ clean images from each target class as triggers and leverages our proposed poisoned image generation framework to inject the triggers into clean images invisibly. By using triggers with the same distribution as clean training images, the targeted DNN models can generalize to the triggers during training, thereby enhancing the effectiveness of our attack on multiple target classes. Extensive experimental results demonstrate that our new backdoor attack is highly effective in attacking multiple target classes and robust against pre-processing operations and existing defenses.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源