论文标题

等:在变压器中编码长长的结构化输入

ETC: Encoding Long and Structured Inputs in Transformers

论文作者

Ainslie, Joshua, Ontanon, Santiago, Alberti, Chris, Cvicek, Vaclav, Fisher, Zachary, Pham, Philip, Ravula, Anirudh, Sanghai, Sumit, Wang, Qifan, Yang, Li

论文摘要

变压器模型已在许多自然语言处理(NLP)任务中提高了最新技术状态。在本文中,我们提出了一种新的变压器体系结构,扩展变压器构建(ETC),该结构解决了标准变压器体系结构的两个关键挑战,即扩展输入长度和编码结构化输入。为了扩大对更长的输入的关注,我们引入了全球令牌和常规输入令牌之间的一种新型的全球环境注意机制。我们还表明,将全局本地关注与相对位置编码和对比性预测编码(CPC)训练预训练目标相结合,可以编码结构化输入。我们在需要长时间和/或结构化输入的四个自然语言数据集上实现最新结果。

Transformer models have advanced the state of the art in many Natural Language Processing (NLP) tasks. In this paper, we present a new Transformer architecture, Extended Transformer Construction (ETC), that addresses two key challenges of standard Transformer architectures, namely scaling input length and encoding structured inputs. To scale attention to longer inputs, we introduce a novel global-local attention mechanism between global tokens and regular input tokens. We also show that combining global-local attention with relative position encodings and a Contrastive Predictive Coding (CPC) pre-training objective allows ETC to encode structured inputs. We achieve state-of-the-art results on four natural language datasets requiring long and/or structured inputs.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源