论文标题
whodunit?学习与作者归因对比
Whodunit? Learning to Contrast for Authorship Attribution
论文作者
论文摘要
作者归因是确定给定文本的作者的任务。关键是找到可以区分作者的表示形式。现有方法通常使用手动设计的功能来捕获数据集的内容和样式,但是这些方法依赖于数据集依赖于数据集,并且在整个语料库中的性能不一致。在这项工作中,我们通过以对比度目标(Contra-X)对预训练的预训练的通用语言表示来提出\ textit {Learning {Learning}作者特定的表示。我们表明,Contra-X学习了为不同作者形成高度可分开群集的表示。它在多个人类和机器作者身份归因基准上提高了最先进的方法,从而使跨透明式微调的提高高达6.8%。但是,我们发现Contra-X以某些作者的牺牲绩效为代价提高了整体准确性。解决这种紧张关系将是未来工作的重要方向。据我们所知,我们是第一个将对比度学习与预先训练的语言模型进行微调归因于作者身份归因的人。
Authorship attribution is the task of identifying the author of a given text. The key is finding representations that can differentiate between authors. Existing approaches typically use manually designed features that capture a dataset's content and style, but these approaches are dataset-dependent and yield inconsistent performance across corpora. In this work, we propose \textit{learning} author-specific representations by fine-tuning pre-trained generic language representations with a contrastive objective (Contra-X). We show that Contra-X learns representations that form highly separable clusters for different authors. It advances the state-of-the-art on multiple human and machine authorship attribution benchmarks, enabling improvements of up to 6.8% over cross-entropy fine-tuning. However, we find that Contra-X improves overall accuracy at the cost of sacrificing performance for some authors. Resolving this tension will be an important direction for future work. To the best of our knowledge, we are the first to integrate contrastive learning with pre-trained language model fine-tuning for authorship attribution.