传播爱而不是仇恨：破坏仇恨培训对仇恨言论检测的重要性

论文标题

传播爱而不是仇恨：破坏仇恨培训对仇恨言论检测的重要性

Spread Love Not Hate: Undermining the Importance of Hateful Pre-training for Hate Speech Detection

论文作者

Gokhale, Omkar, Kane, Aditya, Patankar, Shantanu, Chavan, Tanmay, Joshi, Raviraj

论文摘要

训练大型神经语言模型（例如BERT）已导致许多自然语言处理（NLP）任务获得了令人印象深刻的收益。尽管这种方法已被证明对许多领域有效，但它可能并不总是提供理想的好处。在本文中，我们研究了仇恨预培训对低资源仇恨言语分类任务的影响。尽管以前关于英语的研究强调了它的重要性，但我们旨在通过一些非明显的见解来增强他们的观察结果。我们评估了40m Tweet数据集的可恨，非仇恨和混合子集预先训练的基于推文的BERT模型的不同变化。这项评估是针对印度语言印度语和马拉地语进行的。本文是经验证据，表明可恨的预训练不是仇恨言论检测的最佳培训选择。我们表明，来自目标域的非仇恨文本的预培训可提供相似或更好的结果。此外，我们介绍了Hindtweetbert和Mahatweetbert，这是第一个在印地语和Marathi推文中预先培训的BERT模型。我们表明，他们在仇恨言语分类任务上提供最先进的表现。我们还发布了两种语言的仇恨伯特，以及金仇恨言论评估基准Hatheval-hi和HateVal-Mr，由手动标记为2000个推文组成。这些模型和数据可在https://github.com/l3cube-pune/marathinlp上找到。

Pre-training large neural language models, such as BERT, has led to impressive gains on many natural language processing (NLP) tasks. Although this method has proven to be effective for many domains, it might not always provide desirable benefits. In this paper, we study the effects of hateful pre-training on low-resource hate speech classification tasks. While previous studies on the English language have emphasized its importance, we aim to augment their observations with some non-obvious insights. We evaluate different variations of tweet-based BERT models pre-trained on hateful, non-hateful, and mixed subsets of a 40M tweet dataset. This evaluation is carried out for the Indian languages Hindi and Marathi. This paper is empirical evidence that hateful pre-training is not the best pre-training option for hate speech detection. We show that pre-training on non-hateful text from the target domain provides similar or better results. Further, we introduce HindTweetBERT and MahaTweetBERT, the first publicly available BERT models pre-trained on Hindi and Marathi tweets, respectively. We show that they provide state-of-the-art performance on hate speech classification tasks. We also release hateful BERT for the two languages and a gold hate speech evaluation benchmark HateEval-Hi and HateEval-Mr consisting of manually labeled 2000 tweets each. The models and data are available at https://github.com/l3cube-pune/MarathiNLP .

下载PDF全文

下载文献需遵守相关版权规定

论文标题