神经网络中可扩展的后门检测

论文标题

神经网络中可扩展的后门检测

Scalable Backdoor Detection in Neural Networks

论文作者

Harikumar, Haripriya, Le, Vuong, Rana, Santu, Bhattacharya, Sourangshu, Gupta, Sunil, Venkatesh, Svetha

论文摘要

最近，已经表明，深度学习模型容易受到特洛伊木马攻击的影响，攻击者可以在训练时间内安装后门，以使最终的模型错误地识别被小型触发补丁污染的样品。当前的后门检测方法无法实现良好的检测性能，并且在计算上很昂贵。在本文中，我们提出了一种基于触发反向工程的新型方法，该方法的计算复杂性与标签数量不扩展，并且基于在不同网络和贴片类型中既可以解释又通用的度量。在实验中，我们观察到我们的方法在将Trojan的模型与纯模型分开时，取得了完美的分数，这是对当前最新方法的改进。

Recently, it has been shown that deep learning models are vulnerable to Trojan attacks, where an attacker can install a backdoor during training time to make the resultant model misidentify samples contaminated with a small trigger patch. Current backdoor detection methods fail to achieve good detection performance and are computationally expensive. In this paper, we propose a novel trigger reverse-engineering based approach whose computational complexity does not scale with the number of labels, and is based on a measure that is both interpretable and universal across different network and patch types. In experiments, we observe that our method achieves a perfect score in separating Trojaned models from pure models, which is an improvement over the current state-of-the art method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题