使用弹性重量巩固对神经机译的无监督预处理

论文标题

使用弹性重量巩固对神经机译的无监督预处理

Unsupervised Pretraining for Neural Machine Translation Using Elastic Weight Consolidation

论文作者

Variš, Dušan, Bojar, Ondřej

论文摘要

这项工作介绍了我们对神经机器翻译（NMT）无监督预审计的持续研究。在我们的方法中，我们使用两个语言模型初始化编码器和解码器的权重，这些语言模型经过单语言数据训练，然后使用弹性权重合并（EWC）对平行数据进行微调，以避免忘记原始语言建模任务。我们将EWC的正则化与以前的工作进行了比较，该工作重点是通过语言建模目标进行正规化。积极的结果是，将EWC与解码器一起获得与以前的工作相似的BLEU得分。但是，该模型收敛的速度更快2-3倍，并且在微调阶段不需要原始未标记的训练数据。相比之下，如果原始任务和新任务与密切相关，则使用EWC的正则化效率较小。我们表明，使用从左到右的语言模型初始化双向NMT编码器，并迫使模型记住原始的左右语言建模任务限制了整个双向上下文的编码能力。

This work presents our ongoing research of unsupervised pretraining in neural machine translation (NMT). In our method, we initialize the weights of the encoder and decoder with two language models that are trained with monolingual data and then fine-tune the model on parallel data using Elastic Weight Consolidation (EWC) to avoid forgetting of the original language modeling tasks. We compare the regularization by EWC with the previous work that focuses on regularization by language modeling objectives. The positive result is that using EWC with the decoder achieves BLEU scores similar to the previous work. However, the model converges 2-3 times faster and does not require the original unlabeled training data during the fine-tuning stage. In contrast, the regularization using EWC is less effective if the original and new tasks are not closely related. We show that initializing the bidirectional NMT encoder with a left-to-right language model and forcing the model to remember the original left-to-right language modeling task limits the learning capacity of the encoder for the whole bidirectional context.

下载PDF全文

下载文献需遵守相关版权规定

论文标题