论文标题
孟加拉国:讽刺检测数据集
BanglaSarc: A Dataset for Sarcasm Detection
论文作者
论文摘要
作为世界上最广泛的语言之一,孟加拉国的使用在社交媒体界也一直在增加。讽刺是一种积极的声明或言论,其基本的负面动机在当今的社交媒体平台中广泛使用。在过去的许多年中,英语的讽刺检测有了显着改善,但是有关孟加拉讽刺检测的情况仍然没有改变。结果,仍然很难确定孟加拉国中的讽刺,而缺乏高质量的数据是主要因素。本文提出了Banglasarc,该数据集是专门用于孟加拉文本数据讽刺检测的数据集。该数据集包含5112条评论/状态和内容,以及从Facebook,YouTube等各种在线社交平台中收集的内容以及一些在线博客。由于孟加拉语中分类评论的数据收集量有限,该数据集将有助于研究发现讽刺,认识到人们的情绪,检测各种类型的孟加拉语表达式和其他领域。该数据集可在https://www.kaggle.com/datasets/sakibapon/banglasarc上公开获取。
Being one of the most widely spoken language in the world, the use of Bangla has been increasing in the world of social media as well. Sarcasm is a positive statement or remark with an underlying negative motivation that is extensively employed in today's social media platforms. There has been a significant improvement in sarcasm detection in English over the previous many years, however the situation regarding Bangla sarcasm detection remains unchanged. As a result, it is still difficult to identify sarcasm in bangla, and a lack of high-quality data is a major contributing factor. This article proposes BanglaSarc, a dataset constructed specifically for bangla textual data sarcasm detection. This dataset contains of 5112 comments/status and contents collected from various online social platforms such as Facebook, YouTube, along with a few online blogs. Due to the limited amount of data collection of categorized comments in Bengali, this dataset will aid in the of study identifying sarcasm, recognizing people's emotion, detecting various types of Bengali expressions, and other domains. The dataset is publicly available at https://www.kaggle.com/datasets/sakibapon/banglasarc.