将诅咒变成祝福：通过稳定模型反转启用无data的后门拆除

论文标题

将诅咒变成祝福：通过稳定模型反转启用无data的后门拆除

Turning a Curse into a Blessing: Enabling In-Distribution-Data-Free Backdoor Removal via Stabilized Model Inversion

论文作者

Chen, Si, Zeng, Yi, Wang, Jiachen T., Park, Won, Chen, Xun, Lyu, Lingjuan, Mao, Zhuoqing, Jia, Ruoxi

论文摘要

机器学习模型中的许多后门拆卸技术都需要干净的分发数据，由于专有数据集可能并非总是可用的。模型反转技术（通常被认为是隐私威胁）可以重建现实的培训样本，从而消除了对分布数据的需求。事先尝试结合后门去除和模型反演的尝试产生了有限的结果。我们的工作是第一个通过解决有关重建样品的特性，感知相似性以及后门触发器的潜在存在的关键问题，对利用模型反转进行有效的后门去除的透彻理解。我们确定，仅依赖感知相似性不足以鲁棒性防御，而模型预测的稳定性响应输入和参数扰动也是至关重要的。为了解决这个问题，我们引入了一种新型的基于双层优化的模型反转框架，从而促进稳定性和视觉质量。有趣的是，我们发现，即使利用后门模型的信号，也可以从预训练的发电机的潜在空间中重建样品。我们提供理论分析以支持这一发现。我们的评估表明，我们稳定的模型反转技术实现了最先进的后门删除性能，而无需清洁分布数据，使用相同数量的干净样品匹配或超过性能。

Many backdoor removal techniques in machine learning models require clean in-distribution data, which may not always be available due to proprietary datasets. Model inversion techniques, often considered privacy threats, can reconstruct realistic training samples, potentially eliminating the need for in-distribution data. Prior attempts to combine backdoor removal and model inversion yielded limited results. Our work is the first to provide a thorough understanding of leveraging model inversion for effective backdoor removal by addressing key questions about reconstructed samples' properties, perceptual similarity, and the potential presence of backdoor triggers. We establish that relying solely on perceptual similarity is insufficient for robust defenses, and the stability of model predictions in response to input and parameter perturbations is also crucial. To tackle this, we introduce a novel bi-level optimization-based framework for model inversion, promoting stability and visual quality. Interestingly, we discover that reconstructed samples from a pre-trained generator's latent space are backdoor-free, even when utilizing signals from a backdoored model. We provide a theoretical analysis to support this finding. Our evaluation demonstrates that our stabilized model inversion technique achieves state-of-the-art backdoor removal performance without clean in-distribution data, matching or surpassing performance using the same amount of clean samples.

下载PDF全文

下载文献需遵守相关版权规定

论文标题