对抗噪声的友好噪音：针对数据中毒攻击的有力防御

论文标题

对抗噪声的友好噪音：针对数据中毒攻击的有力防御

Friendly Noise against Adversarial Noise: A Powerful Defense against Data Poisoning Attacks

论文作者

Liu, Tian Yu, Yang, Yu, Mirzasoleiman, Baharan

论文摘要

（无形的）数据中毒攻击的强大类别通过小型对抗扰动修改了训练示例的子集，以更改某些测试时间数据的预测。现有的防御机制是不希望在实践中部署的，因为它们通常会严重损害概括性能，或者特定攻击性，并且施加速度降低。在这里，我们提出了一种简单但高效的方法，该方法与现有方法不同，可以打破各种类型的看不见的中毒攻击，而概括性能略有下降。我们进行了一个关键观察，即攻击引入了当地的高训练损失区域，当最小化时，这会导致学习对抗性扰动并使攻击成功。为了打破中毒的攻击，我们的关键思想是减轻毒药引入的急剧损失地区。为此，我们的方法包括两个组成部分：一种优化的友好噪声，该噪声可生成最大扰动示例而不降低性能，以及随机变化的噪声组件。两种组件的结合都为针对最强大的无触发目标和隐藏触发后门中毒攻击而制造了非常轻巧但非常有效的防御，包括梯度匹配，公牛 - 眼睛多层和卧铺代理。我们表明，友好的噪音可以转移到其他体系结构，并且自适应攻击由于其随机噪声组件而无法打破防御。我们的代码可在以下网址找到：https：//github.com/tianyu139/friffly-noise

A powerful category of (invisible) data poisoning attacks modify a subset of training examples by small adversarial perturbations to change the prediction of certain test-time data. Existing defense mechanisms are not desirable to deploy in practice, as they often either drastically harm the generalization performance, or are attack-specific, and prohibitively slow to apply. Here, we propose a simple but highly effective approach that unlike existing methods breaks various types of invisible poisoning attacks with the slightest drop in the generalization performance. We make the key observation that attacks introduce local sharp regions of high training loss, which when minimized, results in learning the adversarial perturbations and makes the attack successful. To break poisoning attacks, our key idea is to alleviate the sharp loss regions introduced by poisons. To do so, our approach comprises two components: an optimized friendly noise that is generated to maximally perturb examples without degrading the performance, and a randomly varying noise component. The combination of both components builds a very light-weight but extremely effective defense against the most powerful triggerless targeted and hidden-trigger backdoor poisoning attacks, including Gradient Matching, Bulls-eye Polytope, and Sleeper Agent. We show that our friendly noise is transferable to other architectures, and adaptive attacks cannot break our defense due to its random noise component. Our code is available at: https://github.com/tianyu139/friendly-noise

下载PDF全文

下载文献需遵守相关版权规定

论文标题