通过同步编码有效的长序列

论文标题

通过同步编码有效的长序列

Efficient Long Sequence Encoding via Synchronization

论文作者

Mou, Xiangyang, Yu, Mo, Yao, Bingsheng, Huang, Lifu

论文摘要

预训练的变压器模型已在多种NLP任务中取得了成功，但是在处理长输入序列时效率低下。现有的研究试图通过分割长序列，然后进行分层编码或事后聚集来克服这一挑战。我们提出了用于层次编码的同步机制。我们的方法首先通过在原始输入序列中的角色来识别各个段的锚定令牌，并将其分组。然后在变压器层内，锚固嵌入通过自发项模块在其组中同步。我们的方法是一个具有足够灵活性的一般框架 - 适应新任务时，很容易通过特定于任务的锚定定义来增强。对两个具有不同类型的长输入文本，叙事QA摘要设置和狂野多跳的代表性任务进行的实验表明，我们的方法能够在保持效率的同时改善细分市场之间的全球信息交换。

Pre-trained Transformer models have achieved successes in a wide range of NLP tasks, but are inefficient when dealing with long input sequences. Existing studies try to overcome this challenge via segmenting the long sequence followed by hierarchical encoding or post-hoc aggregation. We propose a synchronization mechanism for hierarchical encoding. Our approach first identifies anchor tokens across segments and groups them by their roles in the original input sequence. Then inside Transformer layer, anchor embeddings are synchronized within their group via a self-attention module. Our approach is a general framework with sufficient flexibility -- when adapted to a new task, it is easy to be enhanced with the task-specific anchor definitions. Experiments on two representative tasks with different types of long input texts, NarrativeQA summary setting and wild multi-hop reasoning from HotpotQA, demonstrate that our approach is able to improve the global information exchange among segments while maintaining efficiency.

下载PDF全文

下载文献需遵守相关版权规定

论文标题