使用预训练的语言模型来制作反对仇恨言论的反叙事：比较研究

论文标题

使用预训练的语言模型来制作反对仇恨言论的反叙事：比较研究

Using Pre-Trained Language Models for Producing Counter Narratives Against Hate Speech: a Comparative Study

论文作者

Tekiroglu, Serra Sinem, Bonaldi, Helena, Fanton, Margherita, Guerini, Marco

论文摘要

在这项工作中，我们介绍了一项有关使用预训练的语言模型用于自动反叙事（CN）生成的任务以与英语在线仇恨言论作斗争的广泛研究。我们首先提出了一项比较研究，以确定是否有特定的语言模型（或LMS类）和最适合生成CNS的特定解码机制。调查结果表明，自回旋模型与随机解码相结合是最有前途的。然后，我们研究LM在产生CN的仇恨目标方面的表现。我们发现，成功的“超出目标”实验的关键要素与培训数据的存在不是总体相似之处，而是特定的培训数据子集的存在，即与可以定义A apriori的测试目标共享某些共同点的目标。我们最终根据添加自动编辑步骤来提炼生成的CNS，介绍管道的想法。

In this work, we present an extensive study on the use of pre-trained language models for the task of automatic Counter Narrative (CN) generation to fight online hate speech in English. We first present a comparative study to determine whether there is a particular Language Model (or class of LMs) and a particular decoding mechanism that are the most appropriate to generate CNs. Findings show that autoregressive models combined with stochastic decodings are the most promising. We then investigate how an LM performs in generating a CN with regard to an unseen target of hate. We find out that a key element for successful `out of target' experiments is not an overall similarity with the training data but the presence of a specific subset of training data, i.e. a target that shares some commonalities with the test target that can be defined a-priori. We finally introduce the idea of a pipeline based on the addition of an automatic post-editing step to refine generated CNs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题