在变压器和交易解码器中探测单词翻译的编码器层

论文标题

在变压器和交易解码器中探测单词翻译的编码器层

Probing Word Translations in the Transformer and Trading Decoder for Encoder Layers

论文作者

Xu, Hongfei, van Genabith, Josef, Liu, Qiuhui, Xiong, Deyi

论文摘要

由于其有效性和性能，变压器翻译模型引起了广泛的关注，最近在基于探测的方法方面。以前的工作着重于编码器中使用或探测源语言特征。迄今为止，尚未调查在变压器层中的单词翻译演变的方式。天真地，人们可能会假设编码层在解码器层翻译时捕获源信息。在这项工作中，我们表明情况并非如此：翻译已经在编码层甚至输入嵌入中逐渐发生。更令人惊讶的是，我们发现一些下部解码器层实际上并没有做太多解码。我们以探测方法的形式展示了所有这些内容，在该方法中，我们将分析的图层的表示形式投影到变压器解码器的最终训练和冷冻的分类器级别，以测量单词翻译精度。我们的发现激发并解释变压器配置的变化：如果翻译已经发生在编码器层中，也许我们可以增加编码层的数量，同时减少解码器层的数量，提高解码速度，而不会损失翻译质量？我们的实验表明确实如此：我们可以将速度提高到2.3的速度2.3，而翻译质量的增长却很小，而18-4个深编码器配置以1.4的速度提高了+1.42 BLEU（EN-DE）的翻译质量。

Due to its effectiveness and performance, the Transformer translation model has attracted wide attention, most recently in terms of probing-based approaches. Previous work focuses on using or probing source linguistic features in the encoder. To date, the way word translation evolves in Transformer layers has not yet been investigated. Naively, one might assume that encoder layers capture source information while decoder layers translate. In this work, we show that this is not quite the case: translation already happens progressively in encoder layers and even in the input embeddings. More surprisingly, we find that some of the lower decoder layers do not actually do that much decoding. We show all of this in terms of a probing approach where we project representations of the layer analyzed to the final trained and frozen classifier level of the Transformer decoder to measure word translation accuracy. Our findings motivate and explain a Transformer configuration change: if translation already happens in the encoder layers, perhaps we can increase the number of encoder layers, while decreasing the number of decoder layers, boosting decoding speed, without loss in translation quality? Our experiments show that this is indeed the case: we can increase speed by up to a factor 2.3 with small gains in translation quality, while an 18-4 deep encoder configuration boosts translation quality by +1.42 BLEU (En-De) at a speed-up of 1.4.

下载PDF全文

下载文献需遵守相关版权规定

论文标题