论文标题

ODE变压器:序列生成的普通微分方程启发的模型

ODE Transformer: An Ordinary Differential Equation-Inspired Model for Sequence Generation

论文作者

Li, Bei, Du, Quan, Zhou, Tao, Jing, Yi, Zhou, Shuhan, Zeng, Xin, Xiao, Tong, Zhu, JingBo, Liu, Xuebo, Zhang, Min

论文摘要

残留网络是对普通微分方程(ODE)的解决方案的Euler离散化。本文探讨了变压器和数值方法之间的更深层次的关系。我们首先表明,变压器中的层残留层可以描述为ode的高阶解。受此启发,我们设计了一个新的体系结构{\ it ode transformer},它类似于在ode中充满动力的runge-kutta方法。作为变压器的天然扩展,ODE Transformer易于实现且有效地使用。大规模机器翻译,抽象摘要和语法误差校正任务的实验结果证明了ODE变压器的高通用性。它可以在WMT'14英语 - 德国人和英语 - 法语基准上的强大基线(例如30.77和44.11 BLEU得分)上的模型性能取得很大改善。

Residual networks are an Euler discretization of solutions to Ordinary Differential Equations (ODE). This paper explores a deeper relationship between Transformer and numerical ODE methods. We first show that a residual block of layers in Transformer can be described as a higher-order solution to ODE. Inspired by this, we design a new architecture, {\it ODE Transformer}, which is analogous to the Runge-Kutta method that is well motivated in ODE. As a natural extension to Transformer, ODE Transformer is easy to implement and efficient to use. Experimental results on the large-scale machine translation, abstractive summarization, and grammar error correction tasks demonstrate the high genericity of ODE Transformer. It can gain large improvements in model performance over strong baselines (e.g., 30.77 and 44.11 BLEU scores on the WMT'14 English-German and English-French benchmarks) at a slight cost in inference efficiency.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源