论文标题
通过随机正规化神经激活灵敏度的对抗性鲁棒性
Adversarial robustness via stochastic regularization of neural activation sensitivity
论文作者
论文摘要
最近的工作表明,任何机器学习分类器的输入域必定包含对抗性示例。因此,我们不能再希望能够免疫分类器免受对抗性例子的影响,而只能旨在实现以下两个防御目标:1)使对抗性实例更难找到,或者2)通过将它们推向正确分类的数据点来削弱其对抗性。大多数先前建议的国防机制(即使不是全部)仅针对这两个目标之一,因此可以通过自适应攻击来绕过,这些攻击将国防机制考虑在内。在这项工作中,我们提出了一种新型的防御机制,同时解决了两个防御目标:我们使用新的随机正则化项更难找到损失表面的梯度,从而使对抗性示例更难找到,从而明确降低了单个神经元对小输入扰动的敏感性。此外,我们通过利用Jacobian正则化来将决策边界从正确分类的输入中推开。我们提出了对我们建议的方法的坚实理论基础和经验检验,证明了其优越性优于先前建议的防御机制,并表明它在广泛的适应性攻击方面有效。
Recent works have shown that the input domain of any machine learning classifier is bound to contain adversarial examples. Thus we can no longer hope to immune classifiers against adversarial examples and instead can only aim to achieve the following two defense goals: 1) making adversarial examples harder to find, or 2) weakening their adversarial nature by pushing them further away from correctly classified data points. Most if not all the previously suggested defense mechanisms attend to just one of those two goals, and as such, could be bypassed by adaptive attacks that take the defense mechanism into consideration. In this work we suggest a novel defense mechanism that simultaneously addresses both defense goals: We flatten the gradients of the loss surface, making adversarial examples harder to find, using a novel stochastic regularization term that explicitly decreases the sensitivity of individual neurons to small input perturbations. In addition, we push the decision boundary away from correctly classified inputs by leveraging Jacobian regularization. We present a solid theoretical basis and an empirical testing of our suggested approach, demonstrate its superiority over previously suggested defense mechanisms, and show that it is effective against a wide range of adaptive attacks.