论文标题
欢迎来到代词的现代世界:包括身份的自然语言处理超出性别
Welcome to the Modern World of Pronouns: Identity-Inclusive Natural Language Processing beyond Gender
论文作者
论文摘要
代词世界正在发生变化。从很少有成员的封闭类的单词到更开放的术语以反映身份。但是,即使最近的作品概述了性别排斥语言技术的危害,自然语言处理(NLP)几乎没有反映这种语言转变。特别有问题的是当前的建模第三人称代词,因为它在很大程度上忽略了各种现象,例如Neopronouns,即新颖而不是(尚未)广泛建立的代词集。这种遗漏有助于歧视边缘化和代表性不足的群体,例如非二元个体。但是,当前的NLP技术也忽略了超出性别的其他身份表达现象。在本文中,我们为NLP提供了第三人称代词问题的概述。根据我们的观察和道德考虑,我们定义了一系列用于建模语言技术代词的Desiderata。我们评估现有和新颖的建模方法W.R.T.这些是定性的,并量化了更无歧视方法对既定基准数据的影响。
The world of pronouns is changing. From a closed class of words with few members to a much more open set of terms to reflect identities. However, Natural Language Processing (NLP) is barely reflecting this linguistic shift, even though recent work outlined the harms of gender-exclusive language technology. Particularly problematic is the current modeling 3rd person pronouns, as it largely ignores various phenomena like neopronouns, i.e., pronoun sets that are novel and not (yet) widely established. This omission contributes to the discrimination of marginalized and underrepresented groups, e.g., non-binary individuals. However, other identity-expression phenomena beyond gender are also ignored by current NLP technology. In this paper, we provide an overview of 3rd person pronoun issues for NLP. Based on our observations and ethical considerations, we define a series of desiderata for modeling pronouns in language technology. We evaluate existing and novel modeling approaches w.r.t. these desiderata qualitatively, and quantify the impact of a more discrimination-free approach on established benchmark data.