covid-twitter-bert：一种自然语言处理模型，用于分析Twitter上的Covid-19内容

论文标题

covid-twitter-bert：一种自然语言处理模型，用于分析Twitter上的Covid-19内容

COVID-Twitter-BERT: A Natural Language Processing Model to Analyse COVID-19 Content on Twitter

论文作者

Müller, Martin, Salathé, Marcel, Kummervold, Per E

论文摘要

在这项工作中，我们发布了基于变压器的模型Covid-twitter-Bert（CT-Bert），在Covid-19主题的大量Twitter消息上预处理。与五个不同分类数据集上的基本模型Bert-Large相比，我们的模型显示了10-30％的边际改进。最大的改进是目标域。经过验证的变压器模型（例如CT-Bert）在特定的目标域进行了培训，可用于各种自然语言处理任务，包括分类，提问和聊天机器人。 CT-bert被优化可用于COVID-19内容，特别是Twitter的社交媒体帖子。

In this work, we release COVID-Twitter-BERT (CT-BERT), a transformer-based model, pretrained on a large corpus of Twitter messages on the topic of COVID-19. Our model shows a 10-30% marginal improvement compared to its base model, BERT-Large, on five different classification datasets. The largest improvements are on the target domain. Pretrained transformer models, such as CT-BERT, are trained on a specific target domain and can be used for a wide variety of natural language processing tasks, including classification, question-answering and chatbots. CT-BERT is optimised to be used on COVID-19 content, in particular social media posts from Twitter.

下载PDF全文

下载文献需遵守相关版权规定

论文标题