论文标题
多语言嵌入和跨语性转移的性别偏见
Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer
论文作者
论文摘要
多语言表示从许多语言中嵌入单个语义空间中,因此,与语言相似的单词彼此接近。这些嵌入已被广泛用于各种设置,例如跨语性转移,其中自然语言处理(NLP)模型以一种语言培训,将部署到另一种语言中。虽然跨语性转移技术很强大,但它们将性别偏见从源到目标语言。在本文中,我们研究了多语言嵌入中的性别偏见,以及它如何影响NLP应用程序的转移学习。我们创建了一个多语言数据集来用于偏见分析,并提出了几种方法来量化固有和外在观点中多语言表示中的偏差。实验结果表明,当我们将嵌入到不同的目标空间时,多语言表示中偏差的大小有所不同,并且对齐方向也会影响传递学习的偏差。我们进一步提供了使用多语言单词表示的下游任务的建议。
Multilingual representations embed words from many languages into a single semantic space such that words with similar meanings are close to each other regardless of the language. These embeddings have been widely used in various settings, such as cross-lingual transfer, where a natural language processing (NLP) model trained on one language is deployed to another language. While the cross-lingual transfer techniques are powerful, they carry gender bias from the source to target languages. In this paper, we study gender bias in multilingual embeddings and how it affects transfer learning for NLP applications. We create a multilingual dataset for bias analysis and propose several ways for quantifying bias in multilingual representations from both the intrinsic and extrinsic perspectives. Experimental results show that the magnitude of bias in the multilingual representations changes differently when we align the embeddings to different target spaces and that the alignment direction can also have an influence on the bias in transfer learning. We further provide recommendations for using the multilingual word representations for downstream tasks.