论文标题
微调还不够:DNN模型的简单而有效的水印去除攻击
Fine-tuning Is Not Enough: A Simple yet Effective Watermark Removal Attack for DNN Models
论文作者
论文摘要
水印已成为保护DNN模型知识产权的趋势。从对手的角度来看,最近的工作试图通过设计水印去除攻击来颠覆水印机制。但是,这些攻击主要采用复杂的微调技术,这些技术具有某些致命的缺点或不切实际的假设。在本文中,我们从不同的角度提出了一种新颖的水印去除攻击。我们不仅通过对水印模型进行微调,而是通过结合不可察觉的图案嵌入和空间级别的转换来设计一种简单而强大的变换算法,该算法可以有效,盲目地破坏水印模型到水印样品的记忆。我们还引入了一种轻巧的微调策略来保留模型性能。我们的解决方案所需的有关水印方案的资源或知识要比先前的工作要少得多。广泛的实验结果表明,我们的攻击可以以很高的成功率绕过最先进的水印解决方案。基于我们的攻击,我们提出了水印扩大技术,以增强现有水印的稳健性。
Watermarking has become the tendency in protecting the intellectual property of DNN models. Recent works, from the adversary's perspective, attempted to subvert watermarking mechanisms by designing watermark removal attacks. However, these attacks mainly adopted sophisticated fine-tuning techniques, which have certain fatal drawbacks or unrealistic assumptions. In this paper, we propose a novel watermark removal attack from a different perspective. Instead of just fine-tuning the watermarked models, we design a simple yet powerful transformation algorithm by combining imperceptible pattern embedding and spatial-level transformations, which can effectively and blindly destroy the memorization of watermarked models to the watermark samples. We also introduce a lightweight fine-tuning strategy to preserve the model performance. Our solution requires much less resource or knowledge about the watermarking scheme than prior works. Extensive experimental results indicate that our attack can bypass state-of-the-art watermarking solutions with very high success rates. Based on our attack, we propose watermark augmentation techniques to enhance the robustness of existing watermarks.