论文标题
Edinburghnlp在WNUT-2020任务2:利用具有广义增强的变压器来识别COVID-19的信息
EdinburghNLP at WNUT-2020 Task 2: Leveraging Transformers with Generalized Augmentation for Identifying Informativeness in COVID-19 Tweets
论文作者
论文摘要
在紧急情况下,Twitter和社交媒体已成为必不可少的通信渠道。智能手机小工具的无处不在,使人们能够实时宣布紧急情况。结果,越来越多的机构有兴趣通过编程监视Twitter(救灾组织和新闻机构)。因此,识别推文的信息可以帮助过滤大量推文中的噪音。在本文中,我们介绍了WNUT-2020任务2:识别信息丰富的Covid-19英语推文。我们最成功的模型是在半监督学习(SSL)设置中训练的变压器合奏,包括罗伯塔,XLNET和BERTWEET。所提出的系统在测试集(排行榜上排名第七)的F1得分为0.9011,并且与使用FastText嵌入的基线系统相比,性能显示出显着提高。
Twitter and, in general, social media has become an indispensable communication channel in times of emergency. The ubiquitousness of smartphone gadgets enables people to declare an emergency observed in real-time. As a result, more agencies are interested in programmatically monitoring Twitter (disaster relief organizations and news agencies). Therefore, recognizing the informativeness of a Tweet can help filter noise from the large volumes of Tweets. In this paper, we present our submission for WNUT-2020 Task 2: Identification of informative COVID-19 English Tweets. Our most successful model is an ensemble of transformers, including RoBERTa, XLNet, and BERTweet trained in a Semi-Supervised Learning (SSL) setting. The proposed system achieves an F1 score of 0.9011 on the test set (ranking 7th on the leaderboard) and shows significant gains in performance compared to a baseline system using FastText embeddings.