论文标题
在文本生成模型中的投入输出输入共享嵌入的标准化
Normalization of Input-output Shared Embeddings in Text Generation Models
论文作者
论文摘要
基于神经网络的模型已成为各种自然语言处理任务的最新模型,但是,网络中的输入和输出维度问题仍未得到充分解决,尤其是在文本生成任务(例如机器翻译,文本摘要)中,其中输入和输出都具有巨大的词汇量。因此,已广泛引入和采用输入输出嵌入重量共享,这尚待改进。基于线性代数和统计理论,本文确定了存在的输入输出嵌入权重共享方法的缺点,然后提出了改善输入输出权重共享嵌入的方法,其中嵌入重量矩阵的标准化方法包括最佳性能。这些方法几乎是无计算成本的,可以与其他嵌入技术结合使用,并在最先进的神经网络模型上应用良好的有效性。对于变压器型模型,与WMT'16 EN-DE数据集中的原始版本相比,标准化技术最多可以得到0.6 BLEU的改进,并且IWSLT 14'数据集上的类似BLEU改进。对于DynamicConv模型,可以在WMT'16 EN-DE数据集上实现0.5 BLEU的改进,并且可以实现IWSLT 14'DE-EN翻译任务的0.41 BLEU改进。
Neural Network based models have been state-of-the-art models for various Natural Language Processing tasks, however, the input and output dimension problem in the networks has still not been fully resolved, especially in text generation tasks (e.g. Machine Translation, Text Summarization), in which input and output both have huge sizes of vocabularies. Therefore, input-output embedding weight sharing has been introduced and adopted widely, which remains to be improved. Based on linear algebra and statistical theories, this paper locates the shortcoming of existed input-output embedding weight sharing method, then raises methods for improving input-output weight shared embedding, among which methods of normalization of embedding weight matrices show best performance. These methods are nearly computational cost-free, can get combined with other embedding techniques, and show good effectiveness when applied on state-of-the-art Neural Network models. For Transformer-big models, the normalization techniques can get at best 0.6 BLEU improvement compared to the original version of model on WMT'16 En-De dataset, and similar BLEU improvements on IWSLT 14' datasets. For DynamicConv models, 0.5 BLEU improvement can be attained on WMT'16 En-De dataset, and 0.41 BLEU improvement on IWSLT 14' De-En translation task is achieved.