论文标题

climatext:一个用于气候变化主题检测的数据集

ClimaText: A Dataset for Climate Change Topic Detection

论文作者

Varini, Francesco S., Boyd-Graber, Jordan, Ciaramita, Massimiliano, Leippold, Markus

论文摘要

大众媒体和其他文本来源中的气候变化沟通可能会影响和塑造公众的看法。从这些来源提取气候变化信息是一项重要的任务,例如,用于过滤内容和电子发现,情感分析,自动摘要,提问和事实检查。但是,自动化此过程是一个挑战,因为气候变化是一个复杂,快速且通常是模棱两可的话题,而资源很少,用于流行的基于文本的AI任务。在本文中,我们介绍了\ textsc {climatext},这是一种基于句子的气候变化主题检测的数据集,我们可以公开使用。我们探索不同的方法来识别各种文本源中的气候变化主题。我们发现,流行的基于关键字的模型对于如此复杂而不断发展的任务不足。基于上下文的算法(例如bert \ cite {devlin2018bert})可以检测到许多琐碎的情况,还有各种复杂而隐式的主题模式。然而,我们的分析揭示了改善多个方向的巨大潜力,例如,捕获有关气候变化间接影响的讨论。因此,我们希望这项工作可以成为对该主题进行进一步研究的好起点。

Climate change communication in the mass media and other textual sources may affect and shape public perception. Extracting climate change information from these sources is an important task, e.g., for filtering content and e-discovery, sentiment analysis, automatic summarization, question-answering, and fact-checking. However, automating this process is a challenge, as climate change is a complex, fast-moving, and often ambiguous topic with scarce resources for popular text-based AI tasks. In this paper, we introduce \textsc{ClimaText}, a dataset for sentence-based climate change topic detection, which we make publicly available. We explore different approaches to identify the climate change topic in various text sources. We find that popular keyword-based models are not adequate for such a complex and evolving task. Context-based algorithms like BERT \cite{devlin2018bert} can detect, in addition to many trivial cases, a variety of complex and implicit topic patterns. Nevertheless, our analysis reveals a great potential for improvement in several directions, such as, e.g., capturing the discussion on indirect effects of climate change. Hence, we hope this work can serve as a good starting point for further research on this topic.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源