线性复杂性随机自我注意机制

论文标题

线性复杂性随机自我注意机制

Linear Complexity Randomized Self-attention Mechanism

论文作者

Zheng, Lin, Wang, Chong, Kong, Lingpeng

论文摘要

最近，提出了随机特征注意（RFA），以通过线性化指数内核来近似线性时间和空间复杂性的软磁性注意力。在本文中，我们首先提出了一种新颖的观点，以通过将RFA重新铸造为自称的重要性采样器来理解这种近似值的偏见。该视角进一步阐明了整个软磁注意的\ emph {nobaled}估计量，称为随机注意（RA）。 RA通过特定的分布构建正随机特征，并享有极大的改善近似保真度，尽管表现出二次复杂性。通过结合RA中的表现力和RFA效率，我们开发了一种新型的线性复杂性自我发项机制，称为线性随机注意（LARA）。跨各个领域的广泛实验表明，RA和LARA可显着提高RFA的性能。

Recently, random feature attentions (RFAs) are proposed to approximate the softmax attention in linear time and space complexity by linearizing the exponential kernel. In this paper, we first propose a novel perspective to understand the bias in such approximation by recasting RFAs as self-normalized importance samplers. This perspective further sheds light on an \emph{unbiased} estimator for the whole softmax attention, called randomized attention (RA). RA constructs positive random features via query-specific distributions and enjoys greatly improved approximation fidelity, albeit exhibiting quadratic complexity. By combining the expressiveness in RA and the efficiency in RFA, we develop a novel linear complexity self-attention mechanism called linear randomized attention (LARA). Extensive experiments across various domains demonstrate that RA and LARA significantly improve the performance of RFAs by a substantial margin.

下载PDF全文

下载文献需遵守相关版权规定

论文标题