论文标题

通过使用多语言Bert进行转移学习,孟加拉语中的情感分析

Sentiment analysis in Bengali via transfer learning using multi-lingual BERT

论文作者

Islam, Khondoker Ittehadul, Islam, Md. Saiful, Amin, Md Ruhul

论文摘要

孟加拉语中的情感分析(SA)由于这种印度 - 雅利安语的高度变化性质而具有挑战性,该属性具有160多种不同的动词形式和36种不同形式的名词和24种不同形式的代词。孟加拉域中缺乏标准标记的数据集,使SA的任务变得更加困难。在本文中,我们在孟加拉语中介绍了手动标记的2级和3级SA数据集。我们还证明,具有相关扩展的多语言BERT模型可以通过对这些新颖数据集进行转移学习的方法来培训,以改善情感分类任务中最先进的表现。与当前的最新精度为68 \%相比,这种深度学习模型的2级情感分类的准确性为71 \%。我们还提供了第一个用于3级手动标记数据集的孟加拉语SA分类器,我们提出的模型的准确度为60 \%。我们进一步使用此模型来分析在线日报中公众评论的观点。我们的分析表明,人们更频繁地对政治或体育新闻发表负面评论,而宗教文章的评论代表了积极的情绪。该数据集和代码可在https://github.com/khondokerislam/bengali \ _sentiment上公开获得。

Sentiment analysis (SA) in Bengali is challenging due to this Indo-Aryan language's highly inflected properties with more than 160 different inflected forms for verbs and 36 different forms for noun and 24 different forms for pronouns. The lack of standard labeled datasets in the Bengali domain makes the task of SA even harder. In this paper, we present manually tagged 2-class and 3-class SA datasets in Bengali. We also demonstrate that the multi-lingual BERT model with relevant extensions can be trained via the approach of transfer learning over those novel datasets to improve the state-of-the-art performance in sentiment classification tasks. This deep learning model achieves an accuracy of 71\% for 2-class sentiment classification compared to the current state-of-the-art accuracy of 68\%. We also present the very first Bengali SA classifier for the 3-class manually tagged dataset, and our proposed model achieves an accuracy of 60\%. We further use this model to analyze the sentiment of public comments in the online daily newspaper. Our analysis shows that people post negative comments for political or sports news more often, while the religious article comments represent positive sentiment. The dataset and code is publicly available at https://github.com/KhondokerIslam/Bengali\_Sentiment.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源