一个女孩有一个名字，这是... deobfuscation的对抗性作者身份归因

论文标题

一个女孩有一个名字，这是... deobfuscation的对抗性作者身份归因

A Girl Has A Name, And It's ... Adversarial Authorship Attribution for Deobfuscation

论文作者

Zhai, Wanyue, Rusert, Jonathan, Shafiq, Zubair, Srinivasan, Padmini

论文摘要

自然语言处理的最新进展已使强大的隐私侵入性作者归因。为了应对作者身份的归因，研究人员提出了各种基于规则和基于学习的文本混淆方法。但是，现有的作者混淆方法不考虑对抗性威胁模型。具体而言，他们没有针对接受潜在混淆的经过对抗训练的作者属性进行评估。为了填补这一空白，我们研究了对逆化的对抗作者身份归因的问题。我们表明，经过对抗训练的作者身份属性能够将现有混淆者的有效性从20-30％降低至5-10％。当属性对是否使用哪个混淆器做出错误的假设时，我们还评估了对抗训练的有效性。尽管归因准确性存在明显的降级，但值得注意的是，这种降解仍处于或更高的属性准确性，而属性的准确性根本根本不受对抗训练。我们的结果强调了对具有抗性的更强烈混淆方法的需求

Recent advances in natural language processing have enabled powerful privacy-invasive authorship attribution. To counter authorship attribution, researchers have proposed a variety of rule-based and learning-based text obfuscation approaches. However, existing authorship obfuscation approaches do not consider the adversarial threat model. Specifically, they are not evaluated against adversarially trained authorship attributors that are aware of potential obfuscation. To fill this gap, we investigate the problem of adversarial authorship attribution for deobfuscation. We show that adversarially trained authorship attributors are able to degrade the effectiveness of existing obfuscators from 20-30% to 5-10%. We also evaluate the effectiveness of adversarial training when the attributor makes incorrect assumptions about whether and which obfuscator was used. While there is a a clear degradation in attribution accuracy, it is noteworthy that this degradation is still at or above the attribution accuracy of the attributor that is not adversarially trained at all. Our results underline the need for stronger obfuscation approaches that are resistant to deobfuscation

下载PDF全文

下载文献需遵守相关版权规定

论文标题