论文标题
covid-twitter-bert:一种自然语言处理模型,用于分析Twitter上的Covid-19内容
COVID-Twitter-BERT: A Natural Language Processing Model to Analyse COVID-19 Content on Twitter
论文作者
论文摘要
在这项工作中,我们发布了基于变压器的模型Covid-twitter-Bert(CT-Bert),在Covid-19主题的大量Twitter消息上预处理。与五个不同分类数据集上的基本模型Bert-Large相比,我们的模型显示了10-30%的边际改进。最大的改进是目标域。经过验证的变压器模型(例如CT-Bert)在特定的目标域进行了培训,可用于各种自然语言处理任务,包括分类,提问和聊天机器人。 CT-bert被优化可用于COVID-19内容,特别是Twitter的社交媒体帖子。
In this work, we release COVID-Twitter-BERT (CT-BERT), a transformer-based model, pretrained on a large corpus of Twitter messages on the topic of COVID-19. Our model shows a 10-30% marginal improvement compared to its base model, BERT-Large, on five different classification datasets. The largest improvements are on the target domain. Pretrained transformer models, such as CT-BERT, are trained on a specific target domain and can be used for a wide variety of natural language processing tasks, including classification, question-answering and chatbots. CT-BERT is optimised to be used on COVID-19 content, in particular social media posts from Twitter.