论文标题
线性复杂性随机自我注意机制
Linear Complexity Randomized Self-attention Mechanism
论文作者
论文摘要
最近,提出了随机特征注意(RFA),以通过线性化指数内核来近似线性时间和空间复杂性的软磁性注意力。在本文中,我们首先提出了一种新颖的观点,以通过将RFA重新铸造为自称的重要性采样器来理解这种近似值的偏见。该视角进一步阐明了整个软磁注意的\ emph {nobaled}估计量,称为随机注意(RA)。 RA通过特定的分布构建正随机特征,并享有极大的改善近似保真度,尽管表现出二次复杂性。通过结合RA中的表现力和RFA效率,我们开发了一种新型的线性复杂性自我发项机制,称为线性随机注意(LARA)。跨各个领域的广泛实验表明,RA和LARA可显着提高RFA的性能。
Recently, random feature attentions (RFAs) are proposed to approximate the softmax attention in linear time and space complexity by linearizing the exponential kernel. In this paper, we first propose a novel perspective to understand the bias in such approximation by recasting RFAs as self-normalized importance samplers. This perspective further sheds light on an \emph{unbiased} estimator for the whole softmax attention, called randomized attention (RA). RA constructs positive random features via query-specific distributions and enjoys greatly improved approximation fidelity, albeit exhibiting quadratic complexity. By combining the expressiveness in RA and the efficiency in RFA, we develop a novel linear complexity self-attention mechanism called linear randomized attention (LARA). Extensive experiments across various domains demonstrate that RA and LARA significantly improve the performance of RFAs by a substantial margin.