论文标题
快速网络:通过有效的傅立叶层加速变压器编码器模型
Fast-FNet: Accelerating Transformer Encoder Models via Efficient Fourier Layers
论文作者
论文摘要
基于变压器的语言模型利用注意机制在几乎所有自然语言处理(NLP)任务中进行了大量绩效改进。在其他几个领域也广泛研究了类似的关注结构。尽管注意机制可显着增强模型的性能,但其二次复杂性阻止了长序列的有效处理。最近的工作着重于消除计算效率低下的缺点,并表明基于变压器的模型仍然可以在没有注意力层的情况下达到竞争结果。一项开创性的研究提出了FNET,该研究将注意力层取代了变压器编码器体系结构中的傅立叶变换(FT)。 FNET通过消除注意机制的计算负担来加速训练过程,在加速训练过程的同时,实现了有关原始变压器编码器模型的竞争性能。但是,FNET模型忽略了FT的基本特性,可以利用经典信号处理,以进一步提高模型效率。我们建议在变压器编码器模型中有效部署FT的不同方法。我们提出的架构具有较少的模型参数,较短的培训时间,更少的内存使用情况以及一些额外的性能改进。我们通过对共同基准的广泛实验来证明这些改进。
Transformer-based language models utilize the attention mechanism for substantial performance improvements in almost all natural language processing (NLP) tasks. Similar attention structures are also extensively studied in several other areas. Although the attention mechanism enhances the model performances significantly, its quadratic complexity prevents efficient processing of long sequences. Recent works focused on eliminating the disadvantages of computational inefficiency and showed that transformer-based models can still reach competitive results without the attention layer. A pioneering study proposed the FNet, which replaces the attention layer with the Fourier Transform (FT) in the transformer encoder architecture. FNet achieves competitive performances concerning the original transformer encoder model while accelerating training process by removing the computational burden of the attention mechanism. However, the FNet model ignores essential properties of the FT from the classical signal processing that can be leveraged to increase model efficiency further. We propose different methods to deploy FT efficiently in transformer encoder models. Our proposed architectures have smaller number of model parameters, shorter training times, less memory usage, and some additional performance improvements. We demonstrate these improvements through extensive experiments on common benchmarks.