论文标题

部分可观测时空混沌系统的无模型预测

InFIP: An Explainable DNN Intellectual Property Protection Method based on Intrinsic Features

论文作者

Xue, Mingfu, Wang, Xin, Wu, Yinghao, Ni, Shifeng, Zhang, Yushu, Liu, Weiqiang

论文摘要

近年来,知识产权(IP)保护深度神经网络(DNN)引起了严重的关注。大多数现有作品嵌入了DNN模型中的水印,以进行IP保护,这需要修改模型和缺乏可解释性。在本文中,我们第一次提出了一种基于可解释的人工智能的DNN的可解释的知识产权保护方法。与现有作品相比,所提出的方法不会修改DNN模型,并且所有权验证的决定是可以解释的。我们通过使用深泰勒分解来提取DNN模型的内在特征。由于内在特征由对模型决策的独特解释组成,因此可以将固有特征视为模型的指纹。如果可疑模型的指纹与原始模型相同,则可疑模型被视为盗版模型。实验结果表明,可以成功地使用指纹来验证模型的所有权,并且模型的测试准确性也不影响。此外,提出的方法对微调攻击,修剪攻击,水印覆盖攻击和自适应攻击非常有力。

Intellectual property (IP) protection for Deep Neural Networks (DNNs) has raised serious concerns in recent years. Most existing works embed watermarks in the DNN model for IP protection, which need to modify the model and lack of interpretability. In this paper, for the first time, we propose an interpretable intellectual property protection method for DNN based on explainable artificial intelligence. Compared with existing works, the proposed method does not modify the DNN model, and the decision of the ownership verification is interpretable. We extract the intrinsic features of the DNN model by using Deep Taylor Decomposition. Since the intrinsic feature is composed of unique interpretation of the model's decision, the intrinsic feature can be regarded as fingerprint of the model. If the fingerprint of a suspected model is the same as the original model, the suspected model is considered as a pirated model. Experimental results demonstrate that the fingerprints can be successfully used to verify the ownership of the model and the test accuracy of the model is not affected. Furthermore, the proposed method is robust to fine-tuning attack, pruning attack, watermark overwriting attack, and adaptive attack.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源