论文标题

桥接语言类型学和多语言机器翻译具有多视图语言表示

Bridging Linguistic Typology and Multilingual Machine Translation with Multi-View Language Representations

论文作者

Oncevay, Arturo, Haddow, Barry, Birch, Alexandra

论文摘要

已经隔离地研究了来自语言类型学数据库的稀疏语言矢量和从多语言机器翻译等任务中学到的嵌入,而没有分析它们如何从彼此的语言表征中受益。我们建议使用单数矢量规范相关分析融合这两种视图,并研究从每个来源诱导的哪种信息。通过推断类型学特征和语言系统发育,我们观察到我们的表示形式嵌入了类型学并加强与语言关系的相关性。然后,我们利用多语言机器翻译的多视图语言矢量空间,在需要有关语言相似性信息的任务中,我们实现了竞争性的总体翻译精度,例如语言聚类和对多语言转移的候选人的排名。借助我们的方法,它也可以作为工具发布,我们可以轻松地投射和评估新语言,而无需昂贵的大规模多语言或排名模型,这是相关方法的主要缺点。

Sparse language vectors from linguistic typology databases and learned embeddings from tasks like multilingual machine translation have been investigated in isolation, without analysing how they could benefit from each other's language characterisation. We propose to fuse both views using singular vector canonical correlation analysis and study what kind of information is induced from each source. By inferring typological features and language phylogenies, we observe that our representations embed typology and strengthen correlations with language relationships. We then take advantage of our multi-view language vector space for multilingual machine translation, where we achieve competitive overall translation accuracy in tasks that require information about language similarities, such as language clustering and ranking candidates for multilingual transfer. With our method, which is also released as a tool, we can easily project and assess new languages without expensive retraining of massive multilingual or ranking models, which are major disadvantages of related approaches.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源