CPSAA：使用基于横杆的过程中的缩写架构加速稀疏注意力

论文标题

CPSAA：使用基于横杆的过程中的缩写架构加速稀疏注意力

CPSAA: Accelerating Sparse Attention using Crossbar-based Processing-In-Memory Architecture

论文作者

Li, Huize, Jin, Hai, Zheng, Long, Huang, Yu, Liao, Xiaofei, Chen, Dan, Duan, Zhuohui, Liu, Cong, Xu, Jiahong, Gui, Chuanyi

论文摘要

注意机制需要巨大的计算努力来处理不必要的计算，从而大大限制了系统的性能。研究人员提出稀疏的注意力，将一些DDMM操作转换为SDDMM和SPMM操作。但是，当前的稀疏注意解决方案引入了大量的片外随机内存访问。我们提出了CPSAA，这是一种基于横杆的新型稀疏注意加速器。首先，我们提出了一种新颖的注意计算模式。其次，我们设计了一种基于PIM的新型稀疏修剪体系结构。最后，我们提出了基于横杆的新方法。实验结果表明，CPSAA的平均为89.6倍，32.2倍，17.8倍，3.39倍和3.84倍的性能改善，755.6倍，55.3倍，21.3倍，5.7倍和4.9倍的能量节能在与GPU，FPGA，FPGA，SANGER，REBERT，REBERT，REBERT和RETASANSFORN相比时进行比较。

The attention mechanism requires huge computational efforts to process unnecessary calculations, significantly limiting the system's performance. Researchers propose sparse attention to convert some DDMM operations to SDDMM and SpMM operations. However, current sparse attention solutions introduce massive off-chip random memory access. We propose CPSAA, a novel crossbar-based PIM-featured sparse attention accelerator. First, we present a novel attention calculation mode. Second, we design a novel PIM-based sparsity pruning architecture. Finally, we present novel crossbar-based methods. Experimental results show that CPSAA has an average of 89.6X, 32.2X, 17.8X, 3.39X, and 3.84X performance improvement and 755.6X, 55.3X, 21.3X, 5.7X, and 4.9X energy-saving when compare with GPU, FPGA, SANGER, ReBERT, and ReTransformer.

下载PDF全文

下载文献需遵守相关版权规定

论文标题