通过增强的文本表示来检测虚假新闻

论文标题

通过增强的文本表示来检测虚假新闻

Detecting fake news by enhanced text representation with multi-EDU-structure awareness

论文作者

Wang, Yuhang, Wang, Li, Yang, Yanjie, Zhang, Yilin

论文摘要

由于假新闻对社会和个人构成了严重威胁，因此考虑文本，传播和用户概况，已经进行了许多研究。由于数据收集问题，这些基于传播和用户配置文件的方法在早期阶段不适用。一个很好的替代方法是在发布后立即根据文本检测新闻，并提出了许多基于文本的方法，通常将其用作基本单元作为基本单位。但是，Word是一个太细粒的单位，无法很好地表达连贯的信息，句子或段落太粗糙而无法显示特定的信息。哪种粒度更好，以及如何利用它来增强伪造新闻检测的文本表示形式是两个关键问题。在本文中，我们介绍了基本话语单元（EDU），其粒度位于单词和句子之间，并提出了一个多EDU结构意识模型，以改善假新闻检测的文本表示形式，即EDU4FD。对于多EDU结构意识，我们构建了基于序列的EDU表示和基于图的EDU表示。前者是通过对反映语义连贯性的TextCNN之间的连贯性建模来建模的。对于后者，我们首先提取修辞关系以构建EDU依赖图，该图可以显示全球叙事逻辑并有助于如实地提供主要思想。然后将关系图注意网络（RGAT）设置为获得基于图的EDU表示。最后，使用封闭式递归单元与全球注意力机制相结合，将两个EDU表示形式纳入了假新闻检测的增强文本表示形式。四个跨源假新闻数据集的实验表明，我们的模型优于最先进的基于文本的方法。

Since fake news poses a serious threat to society and individuals, numerous studies have been brought by considering text, propagation and user profiles. Due to the data collection problem, these methods based on propagation and user profiles are less applicable in the early stages. A good alternative method is to detect news based on text as soon as they are released, and a lot of text-based methods were proposed, which usually utilized words, sentences or paragraphs as basic units. But, word is a too fine-grained unit to express coherent information well, sentence or paragraph is too coarse to show specific information. Which granularity is better and how to utilize it to enhance text representation for fake news detection are two key problems. In this paper, we introduce Elementary Discourse Unit (EDU) whose granularity is between word and sentence, and propose a multi-EDU-structure awareness model to improve text representation for fake news detection, namely EDU4FD. For the multi-EDU-structure awareness, we build the sequence-based EDU representations and the graph-based EDU representations. The former is gotten by modeling the coherence between consecutive EDUs with TextCNN that reflect the semantic coherence. For the latter, we first extract rhetorical relations to build the EDU dependency graph, which can show the global narrative logic and help deliver the main idea truthfully. Then a Relation Graph Attention Network (RGAT) is set to get the graph-based EDU representation. Finally, the two EDU representations are incorporated as the enhanced text representation for fake news detection, using a gated recursive unit combined with a global attention mechanism. Experiments on four cross-source fake news datasets show that our model outperforms the state-of-the-art text-based methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题