残留的洗牌交换网络，用于快速处理长序列

论文标题

残留的洗牌交换网络，用于快速处理长序列

Residual Shuffle-Exchange Networks for Fast Processing of Long Sequences

论文作者

Draguns, Andis, Ozoliņš, Emīls, Šostaks, Agris, Apinis, Matīss, Freivalds, Kārlis

论文摘要

注意是序列处理中常用的一种常用机制，但是O（n^2）的复杂性是阻止其在长序列中的应用。最近引入的神经洗牌交换网络提供了一种计算效率的替代方案，从而实现了O（n log n）时间中长距离依赖性的建模。但是，该模型非常复杂，涉及从封闭式复发单元中得出的复杂的门控机制。在本文中，我们提出了Shuffle-Exchange网络的简单且轻巧的变体，该变体基于采用GELU和层归一化的残留网络。所提出的体系结构不仅会缩放到更长的序列，还可以更快地收敛，并提供更好的准确性。它超过了Lambada语言建模任务上的Shuffle-Exchange网络，并在MusicNet数据集中实现了音乐转录的最新性能，同时有效地参数次数。我们展示了如何将改进的洗牌网络与卷积层相结合，并将其确定为长序处理应用程序中有用的构建块。

Attention is a commonly used mechanism in sequence processing, but it is of O(n^2) complexity which prevents its application to long sequences. The recently introduced neural Shuffle-Exchange network offers a computation-efficient alternative, enabling the modelling of long-range dependencies in O(n log n) time. The model, however, is quite complex, involving a sophisticated gating mechanism derived from the Gated Recurrent Unit. In this paper, we present a simple and lightweight variant of the Shuffle-Exchange network, which is based on a residual network employing GELU and Layer Normalization. The proposed architecture not only scales to longer sequences but also converges faster and provides better accuracy. It surpasses the Shuffle-Exchange network on the LAMBADA language modelling task and achieves state-of-the-art performance on the MusicNet dataset for music transcription while being efficient in the number of parameters. We show how to combine the improved Shuffle-Exchange network with convolutional layers, establishing it as a useful building block in long sequence processing applications.

下载PDF全文

下载文献需遵守相关版权规定

论文标题