论文标题
说多种语言会影响语言模型的道德偏见
Speaking Multiple Languages Affects the Moral Bias of Language Models
论文作者
论文摘要
预训练的多语言模型(PMLMS)在处理来自多种语言和跨语性转移的数据时通常使用。但是,对PMLM进行了针对每种语言的不同数据的培训。在实践中,这意味着他们在英语上的性能通常比许多其他语言要好得多。我们探索这在多大程度上也适用于道德规范。模型是否从英语中捕获道德规范并将其强加于其他语言?这些模型是否表现出某些语言的随机性,因此具有潜在的有害信念?这两个问题都可能对跨语性转移产生负面影响,并可能导致有害结果。在本文中,我们(1)将Moraldirection框架应用于多语言模型,比较德语,捷克,阿拉伯语,中文和英语的结果,(2)分析对过滤的平行字幕语料库中的模型行为,(3)将模型应用于道德基础调查表,将模型与来自不同国家的人类回答进行比较。我们的实验表明,确实是PMLM编码不同的道德偏见,但这些偏见不一定与人类意见中的文化差异或共同点相对应。我们发布代码和模型。
Pre-trained multilingual language models (PMLMs) are commonly used when dealing with data from multiple languages and cross-lingual transfer. However, PMLMs are trained on varying amounts of data for each language. In practice this means their performance is often much better on English than many other languages. We explore to what extent this also applies to moral norms. Do the models capture moral norms from English and impose them on other languages? Do the models exhibit random and thus potentially harmful beliefs in certain languages? Both these issues could negatively impact cross-lingual transfer and potentially lead to harmful outcomes. In this paper, we (1) apply the MoralDirection framework to multilingual models, comparing results in German, Czech, Arabic, Chinese, and English, (2) analyse model behaviour on filtered parallel subtitles corpora, and (3) apply the models to a Moral Foundations Questionnaire, comparing with human responses from different countries. Our experiments demonstrate that, indeed, PMLMs encode differing moral biases, but these do not necessarily correspond to cultural differences or commonalities in human opinions. We release our code and models.