神经媒体偏见检测使用遥远的监督与宝贝 - 专家的偏见注释

论文标题

神经媒体偏见检测使用遥远的监督与宝贝 - 专家的偏见注释

Neural Media Bias Detection Using Distant Supervision With BABE -- Bias Annotations By Experts

论文作者

Spinde, Timo, Plank, Manuel, Krieger, Jan-David, Ruas, Terry, Gipp, Bela, Aizawa, Akiko

论文摘要

媒体报道对公众对事件的看法具有重大影响。但是，媒体经常有偏见。偏见新闻文章的一种方法是改变“选择”一词。通过单词选择对偏见的自动识别是具有挑战性的，这主要是由于缺乏黄金标准数据集和高环境依赖性。本文介绍了Babe，这是由训练有素的专家创建的强大而多样化的数据集，用于媒体偏见研究。我们还分析了为什么专家标签在该域中至关重要。与现有工作相比，我们的数据集提供了更好的注释质量和更高的通知者协议。它由3,700个句子组成，在主题和插座之间平衡，其中包含单词和句子级别上的媒体偏见标签。基于我们的数据，我们还介绍了一种自动检测新闻文章中偏见的句子的方法。我们最佳性能基于BERT的模型是在由遥远标签组成的较大语料库中进行预训练的。对我们提出的监督数据集进行微调和评估模型，我们达到了0.804的宏F1得分，表现优于现有方法。

Media coverage has a substantial effect on the public perception of events. Nevertheless, media outlets are often biased. One way to bias news articles is by altering the word choice. The automatic identification of bias by word choice is challenging, primarily due to the lack of a gold standard data set and high context dependencies. This paper presents BABE, a robust and diverse data set created by trained experts, for media bias research. We also analyze why expert labeling is essential within this domain. Our data set offers better annotation quality and higher inter-annotator agreement than existing work. It consists of 3,700 sentences balanced among topics and outlets, containing media bias labels on the word and sentence level. Based on our data, we also introduce a way to detect bias-inducing sentences in news articles automatically. Our best performing BERT-based model is pre-trained on a larger corpus consisting of distant labels. Fine-tuning and evaluating the model on our proposed supervised data set, we achieve a macro F1-score of 0.804, outperforming existing methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题