论文标题
验证针对后门攻击的神经网络
Verifying Neural Networks Against Backdoor Attacks
论文作者
论文摘要
神经网络在解决许多问题方面取得了最新的性能,包括许多安全/关键安全系统中的应用。研究人员还发现了与神经网络相关的多个安全问题。其中之一是后门攻击,即,神经网络可以嵌入后门,以便在触发器的存在下几乎总是生成目标输出。现有的防御方法主要集中于检测神经网络是否基于启发式方法,例如激活模式。据我们所知,证明缺乏后门的唯一工作是基于随机平滑的,众所周知,这大大降低了神经网络的性能。在这项工作中,我们提出了一种方法来验证给定的神经网络是否没有具有一定水平的成功率的后门。我们的方法集成了统计抽样以及抽象的解释。实验结果表明,我们的方法有效地验证了后门的缺失或产生后门触发器。
Neural networks have achieved state-of-the-art performance in solving many problems, including many applications in safety/security-critical systems. Researchers also discovered multiple security issues associated with neural networks. One of them is backdoor attacks, i.e., a neural network may be embedded with a backdoor such that a target output is almost always generated in the presence of a trigger. Existing defense approaches mostly focus on detecting whether a neural network is 'backdoored' based on heuristics, e.g., activation patterns. To the best of our knowledge, the only line of work which certifies the absence of backdoor is based on randomized smoothing, which is known to significantly reduce neural network performance. In this work, we propose an approach to verify whether a given neural network is free of backdoor with a certain level of success rate. Our approach integrates statistical sampling as well as abstract interpretation. The experiment results show that our approach effectively verifies the absence of backdoor or generates backdoor triggers.