“假新闻”和日本假新闻数据集的注释式修建

论文标题

“假新闻”和日本假新闻数据集的注释式修建

Annotation-Scheme Reconstruction for "Fake News" and Japanese Fake News Dataset

论文作者

Murayama, Taichi, Hisada, Shohei, Uehara, Makoto, Wakamiya, Shoko, Aramaki, Eiji

论文摘要

假新闻引起了许多社会问题；因此，已经对虚假新闻检测任务进行了广泛的研究以对抗它。许多假新闻数据集被构建为促进此任务的资源。当代研究几乎完全关注新闻的事实方面。但是，仅此方面就不足以解释“假新闻”，这是一个复杂的现象，涉及各种各样的问题。要充分了解假新闻的每个实例的性质，重要的是要从各个角度观察它，例如虚假新闻传播者的意图，新闻对我们社会的有害和新闻的目标。我们提出了一种新颖的注释方案，并根据对现有的假新闻数据集的详细调查，并使用细粒度的标签，以捕获假新闻的各个方面。使用注释方案，我们构建并发布了第一个日本假新闻数据集。预计注释计划将对假新闻有深入的了解。我们计划使用我们的方案为日语和其他语言构建数据集。我们的日本数据集发表在https://hkefka385.github.io/dataset/fakenews-japanese/上。

Fake news provokes many societal problems; therefore, there has been extensive research on fake news detection tasks to counter it. Many fake news datasets were constructed as resources to facilitate this task. Contemporary research focuses almost exclusively on the factuality aspect of the news. However, this aspect alone is insufficient to explain "fake news," which is a complex phenomenon that involves a wide range of issues. To fully understand the nature of each instance of fake news, it is important to observe it from various perspectives, such as the intention of the false news disseminator, the harmfulness of the news to our society, and the target of the news. We propose a novel annotation scheme with fine-grained labeling based on detailed investigations of existing fake news datasets to capture these various aspects of fake news. Using the annotation scheme, we construct and publish the first Japanese fake news dataset. The annotation scheme is expected to provide an in-depth understanding of fake news. We plan to build datasets for both Japanese and other languages using our scheme. Our Japanese dataset is published at https://hkefka385.github.io/dataset/fakenews-japanese/.

下载PDF全文

下载文献需遵守相关版权规定

论文标题