论文标题
识别对话系统中的社交偏见:框架,数据集和基准测试
Towards Identifying Social Bias in Dialog Systems: Frame, Datasets, and Benchmarks
论文作者
论文摘要
经过大规模语料库培训的神经模型,开放域对话系统的研究极大地繁荣了,但是,这种语料库经常引入各种安全问题(例如,进攻性语言,偏见和有毒行为),这显着阻碍了对话系统在实践中的部署。在所有这些不安全的问题中,解决社会偏见更加复杂,因为它对边缘化人群的负面影响通常被隐含地表示,因此需要规范性推理和严格的分析。在本文中,我们将调查重点放在对话安全问题的社会偏见检测上。我们首先提出了一个新颖的拨号框架,用于务实地分析对话中的社会偏见,该框架考虑了更全面的偏见相关分析,而不是简单的二分法注释。根据提议的框架,我们进一步介绍了CDAIL-sir-bias数据集,据我们所知,这是第一个通知的中国社交偏见对话框数据集。此外,我们在不同的标签粒度和输入类型(发言级和上下文级别)上建立了几个对话偏差检测基准。我们表明,在我们的拨号符号框架中,提出的深入分析以及这些基准是必要的,对于偏向检测任务至关重要,并且可以使构建安全的对话系统在实践中受益。
The research of open-domain dialog systems has been greatly prospered by neural models trained on large-scale corpora, however, such corpora often introduce various safety problems (e.g., offensive languages, biases, and toxic behaviors) that significantly hinder the deployment of dialog systems in practice. Among all these unsafe issues, addressing social bias is more complex as its negative impact on marginalized populations is usually expressed implicitly, thus requiring normative reasoning and rigorous analysis. In this paper, we focus our investigation on social bias detection of dialog safety problems. We first propose a novel Dial-Bias Frame for analyzing the social bias in conversations pragmatically, which considers more comprehensive bias-related analyses rather than simple dichotomy annotations. Based on the proposed framework, we further introduce CDail-Bias Dataset that, to our knowledge, is the first well-annotated Chinese social bias dialog dataset. In addition, we establish several dialog bias detection benchmarks at different label granularities and input types (utterance-level and context-level). We show that the proposed in-depth analyses together with these benchmarks in our Dial-Bias Frame are necessary and essential to bias detection tasks and can benefit building safe dialog systems in practice.