类型B反射化作为多语言多任务性别偏见的明确测试床

论文标题

类型B反射化作为多语言多任务性别偏见的明确测试床

Type B Reflexivization as an Unambiguous Testbed for Multilingual Multi-Task Gender Bias

论文作者

Gonzalez, Ana Valeria, Barrett, Maria, Hvingelby, Rasmus, Webster, Kellie, Søgaard, Anders

论文摘要

在先前关于NLP性别偏见的研究中，对英语的关注却错过了其他语言的机会：Gap和Winogender等英语挑战数据集突出了“幻觉”的模型偏好，例如，将“医生”作为男性医生的性别 - 歧义出现。我们表明，对于具有B反射类型类型的语言，例如瑞典语和俄罗斯，我们可以构建多任务挑战数据集，以检测导致明确错误的模型预测的性别偏见：在这些语言中，“医生删除他的面具”的直接翻译在核心的阅读和分离之间不含糊。取而代之的是，核心读数需要非性别代词，并且性别所有格代词具有反射性。我们提出了一个多语言，多任务挑战数据集，该数据集涵盖了四种语言和四个NLP任务，并且仅关注此现象。我们发现所有任务语言组合中性别偏见的证据，并将模型偏见与国家劳动市场统计数据相关联。

The one-sided focus on English in previous studies of gender bias in NLP misses out on opportunities in other languages: English challenge datasets such as GAP and WinoGender highlight model preferences that are "hallucinatory", e.g., disambiguating gender-ambiguous occurrences of 'doctor' as male doctors. We show that for languages with type B reflexivization, e.g., Swedish and Russian, we can construct multi-task challenge datasets for detecting gender bias that lead to unambiguously wrong model predictions: In these languages, the direct translation of 'the doctor removed his mask' is not ambiguous between a coreferential reading and a disjoint reading. Instead, the coreferential reading requires a non-gendered pronoun, and the gendered, possessive pronouns are anti-reflexive. We present a multilingual, multi-task challenge dataset, which spans four languages and four NLP tasks and focuses only on this phenomenon. We find evidence for gender bias across all task-language combinations and correlate model bias with national labor market statistics.

下载PDF全文

下载文献需遵守相关版权规定

论文标题