关于数据集的鲁棒性

论文标题

关于数据集的鲁棒性

On the Robustness of Dataset Inference

论文作者

Szyller, Sebastian, Zhang, Rui, Liu, Jian, Asokan, N.

论文摘要

机器学习（ML）模型的培训成本很高，因为它们可能需要大量的数据，计算资源和技术专长。因此，它们构成了有价值的知识产权，需要保护，以防想要窃取它们的对手。所有权验证技术使模型窃取攻击的受害者表明，可疑模型实际上已经从他们的手中偷走了。尽管已经提出了许多基于水印或指纹识别的所有权验证技术，但其中大多数在安全担保（设备齐全的对手都可以避免验证验证）或计算成本方面缺乏。与先前的方法相比，已证明一种指纹技术数据集推理（DI）具有更好的鲁棒性和效率。 DI的作者为线性（可疑）模型提供了正确的证明。但是，在相同设置的子空间中，我们证明DI患有高误报（FPS） - 它可能会错误地识别出与被盗分布的非重叠数据训练的独立模型。我们进一步证明了DI还触发了FPS，以现实的非线性可疑模型。然后，我们从经验上确认，在黑框设置中的DI充满信心地导致FPS。其次，我们表明DI还遭受虚假负面因素（FNS） - 对手可以通过使用对抗性培训来定期地正规化被盗模型的决策边界，从而欺骗DI（以产生精度损失），从而导致FN。为此，我们证明了Black-Box DI无法从被盗数据集中识别受对抗训练的模型 - DI最难逃避的设置。最后，我们讨论了我们发现的含义，一般基于指纹的所有权验证的生存能力，并为未来的工作提出了指示。

Machine learning (ML) models are costly to train as they can require a significant amount of data, computational resources and technical expertise. Thus, they constitute valuable intellectual property that needs protection from adversaries wanting to steal them. Ownership verification techniques allow the victims of model stealing attacks to demonstrate that a suspect model was in fact stolen from theirs. Although a number of ownership verification techniques based on watermarking or fingerprinting have been proposed, most of them fall short either in terms of security guarantees (well-equipped adversaries can evade verification) or computational cost. A fingerprinting technique, Dataset Inference (DI), has been shown to offer better robustness and efficiency than prior methods. The authors of DI provided a correctness proof for linear (suspect) models. However, in a subspace of the same setting, we prove that DI suffers from high false positives (FPs) -- it can incorrectly identify an independent model trained with non-overlapping data from the same distribution as stolen. We further prove that DI also triggers FPs in realistic, non-linear suspect models. We then confirm empirically that DI in the black-box setting leads to FPs, with high confidence. Second, we show that DI also suffers from false negatives (FNs) -- an adversary can fool DI (at the cost of incurring some accuracy loss) by regularising a stolen model's decision boundaries using adversarial training, thereby leading to an FN. To this end, we demonstrate that black-box DI fails to identify a model adversarially trained from a stolen dataset -- the setting where DI is the hardest to evade. Finally, we discuss the implications of our findings, the viability of fingerprinting-based ownership verification in general, and suggest directions for future work.

下载PDF全文

下载文献需遵守相关版权规定

论文标题