论文标题
通过解开位置信息来改善零拍的翻译
Improving Zero-Shot Translation by Disentangling Positional Information
论文作者
论文摘要
多语言神经机器的翻译表明,在训练中看不见的语言对之间直接翻译的能力,即零射击翻译。尽管在概念上很有吸引力,但它通常患有低输出质量。对新翻译方向概括的困难表明,模型表示非常特异性对于训练中看到的语言对。我们证明,引起语言特定表示的主要因素是输入令牌的位置对应。我们表明,可以通过删除编码器层中的残留连接来轻松缓解这一点。通过这种修改,我们在零弹性翻译上最多获得18.5个BLEU点,同时在监督方向上保留质量。相关语言之间的改进尤其突出,我们提出的模型优于基于枢轴的翻译。此外,我们的方法可以轻松整合新语言,从而大大扩展了翻译覆盖范围。通过对隐藏层输出的彻底检查,我们表明我们的方法确实导致了更多与语言无关的表示。
Multilingual neural machine translation has shown the capability of directly translating between language pairs unseen in training, i.e. zero-shot translation. Despite being conceptually attractive, it often suffers from low output quality. The difficulty of generalizing to new translation directions suggests the model representations are highly specific to those language pairs seen in training. We demonstrate that a main factor causing the language-specific representations is the positional correspondence to input tokens. We show that this can be easily alleviated by removing residual connections in an encoder layer. With this modification, we gain up to 18.5 BLEU points on zero-shot translation while retaining quality on supervised directions. The improvements are particularly prominent between related languages, where our proposed model outperforms pivot-based translation. Moreover, our approach allows easy integration of new languages, which substantially expands translation coverage. By thorough inspections of the hidden layer outputs, we show that our approach indeed leads to more language-independent representations.