文本分类中的隐私泄漏：数据提取方法

论文标题

文本分类中的隐私泄漏：数据提取方法

Privacy Leakage in Text Classification: A Data Extraction Approach

论文作者

Elmahdy, Adel, Inan, Huseyin A., Sim, Robert

论文摘要

最近的工作表明，从生成语言模型中成功提取了培训数据。但是，在文本分类模型中，这种提取是否可行，因为培训目标是预测类标签而不是下一词预测。这提出了一个有趣的挑战，并提出了关于文本分类设置中培训数据隐私的重要问题。因此，我们通过研究与学习任务无关的培训数据的意外记忆的问题来研究文本分类域中的潜在隐私泄漏。我们提出了一种算法，以利用模型提供的类标签的可能性来提取部分文本的缺失令牌。我们通过将金丝雀插入训练集并试图在训练后这些金丝雀中提取令牌来测试算法的有效性。在我们的实验中，我们证明了在某种程度上可以成功提取。这也可以用作审计策略，以评估未经同意的任何未经授权使用个人数据的使用。

Recent work has demonstrated the successful extraction of training data from generative language models. However, it is not evident whether such extraction is feasible in text classification models since the training objective is to predict the class label as opposed to next-word prediction. This poses an interesting challenge and raises an important question regarding the privacy of training data in text classification settings. Therefore, we study the potential privacy leakage in the text classification domain by investigating the problem of unintended memorization of training data that is not pertinent to the learning task. We propose an algorithm to extract missing tokens of a partial text by exploiting the likelihood of the class label provided by the model. We test the effectiveness of our algorithm by inserting canaries into the training set and attempting to extract tokens in these canaries post-training. In our experiments, we demonstrate that successful extraction is possible to some extent. This can also be used as an auditing strategy to assess any potential unauthorized use of personal data without consent.

下载PDF全文

下载文献需遵守相关版权规定

论文标题