论文标题
在有毒语言数据集中检测意想不到的社会偏见
Detecting Unintended Social Bias in Toxic Language Datasets
论文作者
论文摘要
随着在线仇恨言论的兴起,自动检测仇恨言论,作为自然语言处理任务的进攻性文本越来越流行。但是,很少进行研究来检测这些有毒语言数据集的意外社会偏见。本文介绍了一个新的数据集Toxicbias,该数据集从现有的Kaggle竞赛数据集中策划了“拼图中的毒性分类中的意想不到的偏见”。我们旨在检测社会偏见,他们的类别和目标群体。该数据集包含对五个不同偏见类别的注释的实例,即性别,种族/种族,宗教,政治和LGBTQ。我们使用策划的数据集训练基于变压器的模型,并报告基线性能,以实现偏差识别,目标产生和偏见的影响。还详细讨论了模型偏见及其缓解措施。我们的研究激发了从有毒语言数据集中系统地提取社会偏见数据。这项工作中用于实验的所有代码和数据集均公开可用
With the rise of online hate speech, automatic detection of Hate Speech, Offensive texts as a natural language processing task is getting popular. However, very little research has been done to detect unintended social bias from these toxic language datasets. This paper introduces a new dataset ToxicBias curated from the existing dataset of Kaggle competition named "Jigsaw Unintended Bias in Toxicity Classification". We aim to detect social biases, their categories, and targeted groups. The dataset contains instances annotated for five different bias categories, viz., gender, race/ethnicity, religion, political, and LGBTQ. We train transformer-based models using our curated datasets and report baseline performance for bias identification, target generation, and bias implications. Model biases and their mitigation are also discussed in detail. Our study motivates a systematic extraction of social bias data from toxic language datasets. All the codes and dataset used for experiments in this work are publicly available