孟加拉仇恨言论在社交媒体上使用基于注意的经常性神经网络

论文标题

孟加拉仇恨言论在社交媒体上使用基于注意的经常性神经网络

Bangla hate speech detection on social media using attention-based recurrent neural network

论文作者

Das, Amit Kumar, Asif, Abdullah Al, Paul, Anik, Hossain, Md. Nur

论文摘要

仇恨言论通过每日使用技术的日常使用，并且通过在负面方面分享您在社交媒体上的观点或感受，从而更加迅速地传播。尽管在检测英语，德语和其他语言的仇恨言论方面已经进行了许多著作，但在孟加拉语的背景下，很少进行著作。相比之下，成千上万的人在孟加拉语的社交媒体上进行沟通。进行的少数现有工作需要改善准确性和可解释性。本文提出了基于编码器解码器的机器学习模型（NLP中的流行工具），以对用户在Facebook页面上的评论进行分类。 7,425个孟加拉语评论的数据集由七个不同类别的仇恨言论组成，用于培训和评估我们的模型。为了从注释中提取和编码本地特征，使用了一维卷积层。最后，注意机制，LSTM和基于GRU的解码器已被用于预测仇恨言论类别。在三种编码器解码器算法中，基于注意力的解码器获得了最佳准确性（77％）。

Hate speech has spread more rapidly through the daily use of technology and, most notably, by sharing your opinions or feelings on social media in a negative aspect. Although numerous works have been carried out in detecting hate speeches in English, German, and other languages, very few works have been carried out in the context of the Bengali language. In contrast, millions of people communicate on social media in Bengali. The few existing works that have been carried out need improvements in both accuracy and interpretability. This article proposed encoder decoder based machine learning model, a popular tool in NLP, to classify user's Bengali comments on Facebook pages. A dataset of 7,425 Bengali comments, consisting of seven distinct categories of hate speeches, was used to train and evaluate our model. For extracting and encoding local features from the comments, 1D convolutional layers were used. Finally, the attention mechanism, LSTM, and GRU based decoders have been used for predicting hate speech categories. Among the three encoder decoder algorithms, the attention-based decoder obtained the best accuracy (77%).

下载PDF全文

下载文献需遵守相关版权规定

论文标题