论文标题

基于信心的双向全球上下文意识到神经机器翻译的培训框架

Confidence Based Bidirectional Global Context Aware Training Framework for Neural Machine Translation

论文作者

Zhou, Chulun, Meng, Fandong, Zhou, Jie, Zhang, Min, Wang, Hongji, Su, Jinsong

论文摘要

大多数主导的神经机器翻译(NMT)模型仅限于以左右的方式根据局部上下文进行预测。尽管许多先前的研究试图将全球信息纳入NMT模型,但仍存在有关如何有效利用双向全球环境的局限性。在本文中,我们提出了一个基于置信的双向全球环境意识(CBBGCA)培训框架,该培训框架是NMT的,其中NMT模型通过辅助条件掩盖语言模型(CMLM)共同训练。培训由两个阶段组成:(1)多任务联合培训; (2)基于信心的知识蒸馏。在第一阶段,通过共享编码器参数,NMT模型还由来自包含双向全局上下文的CMLM解码器的信号进行监督。此外,在第二阶段,使用CMLM作为教师,我们通过知识蒸馏将双向全局上下文与NMT模型相关。实验结果表明,我们提出的CBBGCA培训框架可在三个大型翻译数据集上显着提高+1.02,+1.30和+0.57 BLEU的分数,即WMT'14英语对英语,wmt'14英语,WMT'19中文对英语和wmt'14-14-14-14-英语。

Most dominant neural machine translation (NMT) models are restricted to make predictions only according to the local context of preceding words in a left-to-right manner. Although many previous studies try to incorporate global information into NMT models, there still exist limitations on how to effectively exploit bidirectional global context. In this paper, we propose a Confidence Based Bidirectional Global Context Aware (CBBGCA) training framework for NMT, where the NMT model is jointly trained with an auxiliary conditional masked language model (CMLM). The training consists of two stages: (1) multi-task joint training; (2) confidence based knowledge distillation. At the first stage, by sharing encoder parameters, the NMT model is additionally supervised by the signal from the CMLM decoder that contains bidirectional global contexts. Moreover, at the second stage, using the CMLM as teacher, we further pertinently incorporate bidirectional global context to the NMT model on its unconfidently-predicted target words via knowledge distillation. Experimental results show that our proposed CBBGCA training framework significantly improves the NMT model by +1.02, +1.30 and +0.57 BLEU scores on three large-scale translation datasets, namely WMT'14 English-to-German, WMT'19 Chinese-to-English and WMT'14 English-to-French, respectively.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源