BAN-ABSA：孟加拉语及其基线评估的基于方面的情感分析数据集

论文标题

BAN-ABSA：孟加拉语及其基线评估的基于方面的情感分析数据集

BAN-ABSA: An Aspect-Based Sentiment Analysis dataset for Bengali and it's baseline evaluation

论文作者

Masum, Mahfuz Ahmed, Ahmed, Sheikh Junayed, Tasnim, Ayesha, Islam, Md Saiful

论文摘要

由于社交媒体或报纸用户评论的令人叹为观止的增长，在线产品评论评论评论，情感分析（SA）引起了研究人员的重大兴趣。随着域的快速增加，SA工作不仅旨在预测句子或文档的情感，而且还旨在提供有关句子或文档不同方面的必要细节（即基于方面的情感分析）。已经提供了大量用于SA和基于方面情感分析的数据集（ABSA），可用于英语和其他著名的欧洲语言。在本文中，我们提出了一个手动注释的孟加拉语数据集，具有Ban-Absa的高质量，该数据集由3位本地孟加拉语者注释，并带有方面及其相关的情感。该数据集由从一些著名的孟加拉新闻门户收集的9,009个独特评论中，由2,619个正，4,721个负面和1,669个中性数据样本组成。此外，我们进行了一项基线评估，重点是深度学习模型，对于情感分类的方面术语提取和71.08％的准确性的准确性为78.75％。 BAN-ABSA数据集的实验表明，尽管BI-LSTM在平均F1分数方面显着优于CNN模型，而CNN模型的精度却更好。

Due to the breathtaking growth of social media or newspaper user comments, online product reviews comments, sentiment analysis (SA) has captured substantial interest from the researchers. With the fast increase of domain, SA work aims not only to predict the sentiment of a sentence or document but also to give the necessary detail on different aspects of the sentence or document (i.e. aspect-based sentiment analysis). A considerable number of datasets for SA and aspect-based sentiment analysis (ABSA) have been made available for English and other well-known European languages. In this paper, we present a manually annotated Bengali dataset of high quality, BAN-ABSA, which is annotated with aspect and its associated sentiment by 3 native Bengali speakers. The dataset consists of 2,619 positive, 4,721 negative and 1,669 neutral data samples from 9,009 unique comments gathered from some famous Bengali news portals. In addition, we conducted a baseline evaluation with a focus on deep learning model, achieved an accuracy of 78.75% for aspect term extraction and accuracy of 71.08% for sentiment classification. Experiments on the BAN-ABSA dataset show that the CNN model is better in terms of accuracy though Bi-LSTM significantly outperforms CNN model in terms of average F1-score.

下载PDF全文

下载文献需遵守相关版权规定

论文标题