论文标题
基于刻板印象内容模型的强大偏置缓解程序
A Robust Bias Mitigation Procedure Based on the Stereotype Content Model
论文作者
论文摘要
刻板印象内容模型(SCM)指出,我们倾向于将少数群体视为冷,无能或两者兼而有之。在本文中,我们调整了现有的工作,以证明刻板印象内容模型可用于上下文化的单词嵌入,然后使用这些结果来评估旨在将语言模型远离少数群体刻板印象的刻画的微调过程。我们发现SCM术语比与愉悦相关的人口不可知论术语更好地捕获了偏见。此外,我们能够通过一个简单的微调程序来减少模型中刻板印象的存在,该程序需要最少的人类和计算机资源,而不会损害下游性能。我们将这项工作作为一种依据程序的原型介绍,旨在消除对模型中偏差细节的先验知识的需求。
The Stereotype Content model (SCM) states that we tend to perceive minority groups as cold, incompetent or both. In this paper we adapt existing work to demonstrate that the Stereotype Content model holds for contextualised word embeddings, then use these results to evaluate a fine-tuning process designed to drive a language model away from stereotyped portrayals of minority groups. We find the SCM terms are better able to capture bias than demographic agnostic terms related to pleasantness. Further, we were able to reduce the presence of stereotypes in the model through a simple fine-tuning procedure that required minimal human and computer resources, without harming downstream performance. We present this work as a prototype of a debiasing procedure that aims to remove the need for a priori knowledge of the specifics of bias in the model.