论文标题
在多语言BERT中寻找语言线索以改善跨语性概括
Looking for Clues of Language in Multilingual BERT to Improve Cross-lingual Generalization
论文作者
论文摘要
多语言Bert(M-Bert)中的令牌嵌入包含语言和语义信息。我们发现,可以通过简单平均语言的嵌入来获得语言的表示。鉴于此语言表示,我们通过操纵令牌嵌入来控制多语言BERT的输出语言,从而实现无监督的令牌翻译。我们进一步提出了一种计算廉价但有效的方法,以根据这一观察结果提高M-Bert的跨语性能力。
Token embeddings in multilingual BERT (m-BERT) contain both language and semantic information. We find that the representation of a language can be obtained by simply averaging the embeddings of the tokens of the language. Given this language representation, we control the output languages of multilingual BERT by manipulating the token embeddings, thus achieving unsupervised token translation. We further propose a computationally cheap but effective approach to improve the cross-lingual ability of m-BERT based on this observation.