未来引导的增量变压器，用于同时翻译

论文标题

未来引导的增量变压器，用于同时翻译

Future-Guided Incremental Transformer for Simultaneous Translation

论文作者

Zhang, Shaolei, Feng, Yang, Li, Liangyou

论文摘要

同时翻译（ST）在读取源句子时同步启动翻译，并在许多在线场景中使用。先前的Wait-K政策是简洁的，并在ST中取得了良好的结果。但是，Wait-K政策面临两个弱点：由于隐藏状态的重新计算以及缺乏未来的来源信息引起的低训练速度。对于低训练速度，我们提出了一个带有平均嵌入层（AEL）的增量变压器，以加速训练过程中隐藏状态的计算速度。对于未来指导的培训，我们建议传统的变压器作为增量变压器的老师，并试图通过知识蒸馏将未来的一些信息嵌入模型中。我们进行了有关中文英语和德语 - 英语同时翻译任务的实验，并与Wait-K政策进行了比较以评估所提出的方法。我们的方法可以有效地将训练速度提高约28倍，在不同的K下平均将训练速度提高约28倍，并隐式嵌入模型中的一些预测能力，从而比Wait-K基线获得更好的翻译质量。

Simultaneous translation (ST) starts translations synchronously while reading source sentences, and is used in many online scenarios. The previous wait-k policy is concise and achieved good results in ST. However, wait-k policy faces two weaknesses: low training speed caused by the recalculation of hidden states and lack of future source information to guide training. For the low training speed, we propose an incremental Transformer with an average embedding layer (AEL) to accelerate the speed of calculation of the hidden states during training. For future-guided training, we propose a conventional Transformer as the teacher of the incremental Transformer, and try to invisibly embed some future information in the model through knowledge distillation. We conducted experiments on Chinese-English and German-English simultaneous translation tasks and compared with the wait-k policy to evaluate the proposed method. Our method can effectively increase the training speed by about 28 times on average at different k and implicitly embed some predictive abilities in the model, achieving better translation quality than wait-k baseline.

下载PDF全文

下载文献需遵守相关版权规定

论文标题