论文标题

通过实体掩盖语言建模和多任务学习,增强与危机相关的推文分类

Enhancing Crisis-Related Tweet Classification with Entity-Masked Language Modeling and Multi-Task Learning

论文作者

Seeberger, Philipp, Riedhammer, Korbinian

论文摘要

社交媒体已成为危机管理的重要信息来源,并可以快速访问正在进行的发展和关键信息。但是,分类模型患有事件相关的偏见和高度不平衡的标签分布,这仍然是一项艰巨的任务。为了应对这些挑战,我们提出了将实体掩盖语言建模和分层多标签分类作为多任务学习问题的组合。我们在TREC-IS数据集的推文上评估了我们的方法,并显示了绝对性能增益W.R.T.可操作的信息类型的F1得分最高10%。此外,我们发现实体掩盖会降低过度拟合到内域事件的效果,并可以改善跨事物概括。

Social media has become an important information source for crisis management and provides quick access to ongoing developments and critical information. However, classification models suffer from event-related biases and highly imbalanced label distributions which still poses a challenging task. To address these challenges, we propose a combination of entity-masked language modeling and hierarchical multi-label classification as a multi-task learning problem. We evaluate our method on tweets from the TREC-IS dataset and show an absolute performance gain w.r.t. F1-score of up to 10% for actionable information types. Moreover, we found that entity-masking reduces the effect of overfitting to in-domain events and enables improvements in cross-event generalization.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源