关于资源丰富的机器翻译的预训练和随机限制之间的互补性

论文标题

关于资源丰富的机器翻译的预训练和随机限制之间的互补性

On the Complementarity between Pre-Training and Random-Initialization for Resource-Rich Machine Translation

论文作者

Zan, Changtong, Ding, Liang, Shen, Li, Cao, Yu, Liu, Weifeng, Tao, Dacheng

论文摘要

文本表示的预训练（PT）已成功应用于低资源神经机器翻译（NMT）。但是，它通常无法在资源丰富的NMT上获得显着的收益（有时甚至更糟），与其随机限制（RI）对应物相同。我们迈出了第一步，通过两个探测分析来研究资源丰富的情况下PT和RI之间的互补性，并发现：1）PT并不提高准确性，而是通过实现平坦的损失景观而不是RI的概括。 2）PT并不提高词汇选择的信心，而是通过分配更平滑的词汇概率分布而不是RI的词汇分布来提高词汇选择的信心。基于这些见解，我们建议将它们的互补性与模型融合算法相结合，该算法利用最佳传输来对齐PT和RI之间的神经元。对两个资源丰富的翻译基准测试的实验，WMT'17英语 - 中国（20m）和WMT'19英语 - 德国人（36m），表明PT和RI可以很好地互补，可以实现实质性改进，考虑到翻译准确性，普遍性和负面多样性，可以实现实质性改进。探测工具和代码的发布：https：//github.com/zanchangtong/ptvsri。

Pre-Training (PT) of text representations has been successfully applied to low-resource Neural Machine Translation (NMT). However, it usually fails to achieve notable gains (sometimes, even worse) on resource-rich NMT on par with its Random-Initialization (RI) counterpart. We take the first step to investigate the complementarity between PT and RI in resource-rich scenarios via two probing analyses, and find that: 1) PT improves NOT the accuracy, but the generalization by achieving flatter loss landscapes than that of RI; 2) PT improves NOT the confidence of lexical choice, but the negative diversity by assigning smoother lexical probability distributions than that of RI. Based on these insights, we propose to combine their complementarities with a model fusion algorithm that utilizes optimal transport to align neurons between PT and RI. Experiments on two resource-rich translation benchmarks, WMT'17 English-Chinese (20M) and WMT'19 English-German (36M), show that PT and RI could be nicely complementary to each other, achieving substantial improvements considering both translation accuracy, generalization, and negative diversity. Probing tools and code are released at: https://github.com/zanchangtong/PTvsRI.

下载PDF全文

下载文献需遵守相关版权规定

论文标题