论文标题
流动形式:用保护流线性化变压器
Flowformer: Linearizing Transformers with Conservation Flows
论文作者
论文摘要
基于注意力机制的变压器在各个领域取得了令人印象深刻的成功。但是,注意机制具有二次复杂性,严重阻碍了变形金刚处理大量令牌并扩展到更大的模型。先前的方法主要利用矩阵乘法的相似性分解和关联性来设计线性时间注意机制。他们通过重新引入诸如区域的电感偏见(以模型的一般性和表现力为代价)来避免注意对微不足道分布的关注。在本文中,我们将基于流网络理论的特定电感偏差线性化,使变压器线性化。当信息流从源(值)汇总到水槽(结果)通过学习的流量容量(注意),我们引起了人们的注意。在此框架内,我们将流量保守性的特性应用于注意力,并提出线性复杂性的流动性机制。通过分别保留用于源竞争的水槽的传入流以及水槽分配的传出流,流动意见固有地产生了信息的关注,而无需使用特定的电感偏见。流动性授权,流动形式在线性时间内的范围内表现出色,包括长序列,时间序列,视觉,自然语言和强化学习。代码和设置可在此存储库中找到:https://github.com/thuml/flowformer。
Transformers based on the attention mechanism have achieved impressive success in various areas. However, the attention mechanism has a quadratic complexity, significantly impeding Transformers from dealing with numerous tokens and scaling up to bigger models. Previous methods mainly utilize the similarity decomposition and the associativity of matrix multiplication to devise linear-time attention mechanisms. They avoid degeneration of attention to a trivial distribution by reintroducing inductive biases such as the locality, thereby at the expense of model generality and expressiveness. In this paper, we linearize Transformers free from specific inductive biases based on the flow network theory. We cast attention as the information flow aggregated from the sources (values) to the sinks (results) through the learned flow capacities (attentions). Within this framework, we apply the property of flow conservation into attention and propose the Flow-Attention mechanism of linear complexity. By respectively conserving the incoming flow of sinks for source competition and the outgoing flow of sources for sink allocation, Flow-Attention inherently generates informative attentions without using specific inductive biases. Empowered by the Flow-Attention, Flowformer yields strong performance in linear time for wide areas, including long sequence, time series, vision, natural language, and reinforcement learning. The code and settings are available at this repository: https://github.com/thuml/Flowformer.