论文标题
神经网络中可扩展的后门检测
Scalable Backdoor Detection in Neural Networks
论文作者
论文摘要
最近,已经表明,深度学习模型容易受到特洛伊木马攻击的影响,攻击者可以在训练时间内安装后门,以使最终的模型错误地识别被小型触发补丁污染的样品。当前的后门检测方法无法实现良好的检测性能,并且在计算上很昂贵。在本文中,我们提出了一种基于触发反向工程的新型方法,该方法的计算复杂性与标签数量不扩展,并且基于在不同网络和贴片类型中既可以解释又通用的度量。在实验中,我们观察到我们的方法在将Trojan的模型与纯模型分开时,取得了完美的分数,这是对当前最新方法的改进。
Recently, it has been shown that deep learning models are vulnerable to Trojan attacks, where an attacker can install a backdoor during training time to make the resultant model misidentify samples contaminated with a small trigger patch. Current backdoor detection methods fail to achieve good detection performance and are computationally expensive. In this paper, we propose a novel trigger reverse-engineering based approach whose computational complexity does not scale with the number of labels, and is based on a measure that is both interpretable and universal across different network and patch types. In experiments, we observe that our method achieves a perfect score in separating Trojaned models from pure models, which is an improvement over the current state-of-the art method.