论文标题

goelemotions:细颗粒情绪的数据集

GoEmotions: A Dataset of Fine-Grained Emotions

论文作者

Demszky, Dorottya, Movshovitz-Attias, Dana, Ko, Jeongwoo, Cowen, Alan, Nemade, Gaurav, Ravi, Sujith

论文摘要

从构建善解人意的聊天机器人到检测有害的在线行为,了解语言表达的情绪具有广泛的应用。可以使用具有细粒类型学的大规模数据集改进该领域的进步,可适应多个下游任务。我们介绍了Goemotions,这是58K英语Reddit评论的最大手动注释数据集,标有27个情感类别或中性的标签。我们通过主保留的组件分析来证明注释的高质量。我们通过现有的情感基准进行转移学习实验,以表明我们的数据集将其概括为其他领域和不同的情绪分类法。我们基于BERT的模型在我们提出的分类法中达到了F1得分为.46的平均得分,为改进留出了很大的改进空间。

Understanding emotion expressed in language has a wide range of applications, from building empathetic chatbots to detecting harmful online behavior. Advancement in this area can be improved using large-scale datasets with a fine-grained typology, adaptable to multiple downstream tasks. We introduce GoEmotions, the largest manually annotated dataset of 58k English Reddit comments, labeled for 27 emotion categories or Neutral. We demonstrate the high quality of the annotations via Principal Preserved Component Analysis. We conduct transfer learning experiments with existing emotion benchmarks to show that our dataset generalizes well to other domains and different emotion taxonomies. Our BERT-based model achieves an average F1-score of .46 across our proposed taxonomy, leaving much room for improvement.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源