论文标题

通过随机平滑进行标签攻击的认证鲁棒性

Certified Robustness to Label-Flipping Attacks via Randomized Smoothing

论文作者

Rosenfeld, Elan, Winston, Ezra, Ravikumar, Pradeep, Kolter, J. Zico

论文摘要

已知机器学习算法容易受到数据中毒攻击的影响,在该攻击中,对手会操纵训练数据以降低所得分类器的性能。在这项工作中,我们提出了对任意功能的随机平滑的统一观点,我们利用这种新颖的表征提出了一种新的策略,以构建对一般数据中毒攻击的良好性稳定的分类器。作为一种特定的实例化,我们利用框架来构建对标签翻转的强型变体的线性分类器,其中每个测试示例都是独立针对的。换句话说,对于每个测试点,我们的分类器包括一项认证,即如果对手进行了一些培训标签,则其预测将相同。随机平滑以前已用于保证---概率很高---测试时间鲁棒性,以对分类器输入的对抗操纵;我们得出了一种变体,该变体提供了一种确定性,分析结合的,避开了传统上由采样子过程产生的概率证书。此外,我们获得了这些经过认证的界限,其额外的运行时复杂性比标准分类较小,并且在火车或测试分布上没有假设。我们将结果概括为多级案例,提供了第一种多级分类算法,该算法可以证明可用于标签贴标签攻击。

Machine learning algorithms are known to be susceptible to data poisoning attacks, where an adversary manipulates the training data to degrade performance of the resulting classifier. In this work, we present a unifying view of randomized smoothing over arbitrary functions, and we leverage this novel characterization to propose a new strategy for building classifiers that are pointwise-certifiably robust to general data poisoning attacks. As a specific instantiation, we utilize our framework to build linear classifiers that are robust to a strong variant of label flipping, where each test example is targeted independently. In other words, for each test point, our classifier includes a certification that its prediction would be the same had some number of training labels been changed adversarially. Randomized smoothing has previously been used to guarantee---with high probability---test-time robustness to adversarial manipulation of the input to a classifier; we derive a variant which provides a deterministic, analytical bound, sidestepping the probabilistic certificates that traditionally result from the sampling subprocedure. Further, we obtain these certified bounds with minimal additional runtime complexity over standard classification and no assumptions on the train or test distributions. We generalize our results to the multi-class case, providing the first multi-class classification algorithm that is certifiably robust to label-flipping attacks.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源