论文标题
在社交媒体上反对仇恨:大规模的仇恨和反语音分类
Countering hate on social media: Large scale classification of hate and counter speech
论文作者
论文摘要
仇恨的言论正在困扰在线话语,促进了极端的社会运动,并可能引起现实世界中的暴力。解决这个日益增长的全球问题的一种潜在解决方案是公民生成的反言论,公民积极进行充满仇恨的对话,试图恢复民用非极性话语。但是,它在遏制仇恨传播方面的实际有效性是未知的,难以量化。研究这个问题的一个主要障碍是缺乏用于培训自动分类器的大型标记数据集以识别反语音。在这里,我们利用了德国独特的局势,在那里,自贴有组织的在线仇恨和反对言论的群体。我们使用了一种合奏学习算法,该算法将各种段落嵌入与正则逻辑回归函数配对,以在这两个组的数百万个相关推文的语料库中对仇恨和反语音进行分类。我们的管道在样本平衡测试集中达到了宏F1的得分,范围从0.76到0.97 ---排队的精度甚至超过了最新的状态。在数千条推文中,我们使用众包来验证分类器的判断是否与人类判断密切相符。然后,我们使用分类器在2013年至2018年进行了135,000多个全面分辨的Twitter对话中发现仇恨和反语音,并研究了它们的频率和互动。总而言之,我们的结果突出了自动化方法评估协调反语音在稳定社交媒体上对话中的影响的潜力。
Hateful rhetoric is plaguing online discourse, fostering extreme societal movements and possibly giving rise to real-world violence. A potential solution to this growing global problem is citizen-generated counter speech where citizens actively engage in hate-filled conversations to attempt to restore civil non-polarized discourse. However, its actual effectiveness in curbing the spread of hatred is unknown and hard to quantify. One major obstacle to researching this question is a lack of large labeled data sets for training automated classifiers to identify counter speech. Here we made use of a unique situation in Germany where self-labeling groups engaged in organized online hate and counter speech. We used an ensemble learning algorithm which pairs a variety of paragraph embeddings with regularized logistic regression functions to classify both hate and counter speech in a corpus of millions of relevant tweets from these two groups. Our pipeline achieved macro F1 scores on out of sample balanced test sets ranging from 0.76 to 0.97---accuracy in line and even exceeding the state of the art. On thousands of tweets, we used crowdsourcing to verify that the judgments made by the classifier are in close alignment with human judgment. We then used the classifier to discover hate and counter speech in more than 135,000 fully-resolved Twitter conversations occurring from 2013 to 2018 and study their frequency and interaction. Altogether, our results highlight the potential of automated methods to evaluate the impact of coordinated counter speech in stabilizing conversations on social media.