伪造假新闻进行真正的虚假新闻检测：宣传负载的培训数据生成

论文标题

伪造假新闻进行真正的虚假新闻检测：宣传负载的培训数据生成

Faking Fake News for Real Fake News Detection: Propaganda-loaded Training Data Generation

论文作者

Huang, Kung-Hsiang, McKeown, Kathleen, Nakov, Preslav, Choi, Yejin, Ji, Heng

论文摘要

尽管最近在检测神经模型产生的假新闻方面取得了进步，但它们的结果并不容易适用于有效检测人类写的虚假信息。限制它们之间成功转移的是机器生成的虚假新闻与人为实现的差距之间的巨大差距，包括在风格和潜在意图方面的显着差异。考虑到这一点，我们提出了一个新颖的框架，用于生成培训示例，这些示例由人为实现的宣传的已知风格和策略所告知。具体而言，我们进行以自然语言推断为指导的自我批评序列训练，以确保生成的文章的有效性，同时还结合了宣传技术，例如吸引权威和加载语言。特别是，我们创建了一个新的培训数据集，即Prepanews，其中包含2,256个示例，以供将来使用。我们的实验结果表明，在两个公共数据集中，对求职者训练的假新闻探测器更好地检测人类写的虚假信息的F1得分为3.62-7.69％。

Despite recent advances in detecting fake news generated by neural models, their results are not readily applicable to effective detection of human-written disinformation. What limits the successful transfer between them is the sizable gap between machine-generated fake news and human-authored ones, including the notable differences in terms of style and underlying intent. With this in mind, we propose a novel framework for generating training examples that are informed by the known styles and strategies of human-authored propaganda. Specifically, we perform self-critical sequence training guided by natural language inference to ensure the validity of the generated articles, while also incorporating propaganda techniques, such as appeal to authority and loaded language. In particular, we create a new training dataset, PropaNews, with 2,256 examples, which we release for future use. Our experimental results show that fake news detectors trained on PropaNews are better at detecting human-written disinformation by 3.62 - 7.69% F1 score on two public datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题