令人尴尬的简单方法，用于深度神经网络中的特洛伊木马攻击

论文标题

令人尴尬的简单方法，用于深度神经网络中的特洛伊木马攻击

An Embarrassingly Simple Approach for Trojan Attack in Deep Neural Networks

论文作者

Tang, Ruixiang, Du, Mengnan, Liu, Ninghao, Yang, Fan, Hu, Xia

论文摘要

随着在高风险应用中广泛使用深神经网络（DNN），DNN模型的安全问题引起了广泛关注。在本文中，我们研究了一个称为特洛伊木马攻击的特定安全问题，该问题旨在攻击依靠恶意黑客插入的隐藏触发模式的部署的DNN系统。我们提出了一种无训练的攻击方法，该方法与以前的工作不同，在该方法中，在中毒数据集中将重新训练模型注入了木马行为。具体而言，我们不会更改原始模型中的参数，而是将小型特洛伊木马模块（Trojannet）插入目标模型。当用特殊触发器盖章输入时，具有恶意特洛伊木马的感染模型可能会将输入分类为目标标签。拟议中的Trojannet具有多个不错的特性，包括（1）它通过微小的触发模式激活，并保持其他信号的静音，（2）它是模型 - 静态的，可以注入大多数DNNS中，并大大扩展其攻击方案，（3）训练机制大规模的训练工作将大量的训练与常规攻击方法进行比较。实验结果表明，Trojannet可以同时注入所有标签（全标签特洛伊木马攻击），并达到100％的攻击成功率，而不会影响原始任务的模型准确性。实验分析进一步表明，最先进的特洛伊木马检测算法无法检测到Trojannet攻击。该代码可在https://github.com/trx14/trojannet上找到。

With the widespread use of deep neural networks (DNNs) in high-stake applications, the security problem of the DNN models has received extensive attention. In this paper, we investigate a specific security problem called trojan attack, which aims to attack deployed DNN systems relying on the hidden trigger patterns inserted by malicious hackers. We propose a training-free attack approach which is different from previous work, in which trojaned behaviors are injected by retraining model on a poisoned dataset. Specifically, we do not change parameters in the original model but insert a tiny trojan module (TrojanNet) into the target model. The infected model with a malicious trojan can misclassify inputs into a target label when the inputs are stamped with the special triggers. The proposed TrojanNet has several nice properties including (1) it activates by tiny trigger patterns and keeps silent for other signals, (2) it is model-agnostic and could be injected into most DNNs, dramatically expanding its attack scenarios, and (3) the training-free mechanism saves massive training efforts comparing to conventional trojan attack methods. The experimental results show that TrojanNet can inject the trojan into all labels simultaneously (all-label trojan attack) and achieves 100% attack success rate without affecting model accuracy on original tasks. Experimental analysis further demonstrates that state-of-the-art trojan detection algorithms fail to detect TrojanNet attack. The code is available at https://github.com/trx14/TrojanNet.

下载PDF全文

下载文献需遵守相关版权规定

论文标题