低资源神经机器翻译的先前语言模型

论文标题

低资源神经机器翻译的先前语言模型

Language Model Prior for Low-Resource Neural Machine Translation

论文作者

Baziotis, Christos, Haddow, Barry, Birch, Alexandra

论文摘要

大型平行语料库的稀缺是神经机器翻译的重要障碍。一个常见的解决方案是利用接受丰富单语数据培训的语言模型知识（LM）。在这项工作中，我们提出了一种新颖的方法，将LM纳入神经翻译模型（TM）中的先验方法。具体而言，我们添加了一个正则化项，该项将TM的输出分布推在LM之前，同时避免了TM与LM“不同意”时避免进行错误的预测。该目标与知识蒸馏有关，其中可以将LM视为教导TM有关目标语言的教学。所提出的方法不会损害解码速度，因为与以前需要推断的工作不同，LM仅在训练时间使用。我们介绍了不同方法对TM分布的影响的分析。两个低资源机器翻译数据集的结果即使单语言数据有限，也会显示出明显的改进。

The scarcity of large parallel corpora is an important obstacle for neural machine translation. A common solution is to exploit the knowledge of language models (LM) trained on abundant monolingual data. In this work, we propose a novel approach to incorporate a LM as prior in a neural translation model (TM). Specifically, we add a regularization term, which pushes the output distributions of the TM to be probable under the LM prior, while avoiding wrong predictions when the TM "disagrees" with the LM. This objective relates to knowledge distillation, where the LM can be viewed as teaching the TM about the target language. The proposed approach does not compromise decoding speed, because the LM is used only at training time, unlike previous work that requires it during inference. We present an analysis of the effects that different methods have on the distributions of the TM. Results on two low-resource machine translation datasets show clear improvements even with limited monolingual data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题