SpikeFormer：一种用于训练高性能低延迟尖峰神经网络的新型体系结构

论文标题

SpikeFormer：一种用于训练高性能低延迟尖峰神经网络的新型体系结构

Spikeformer: A Novel Architecture for Training High-Performance Low-Latency Spiking Neural Network

论文作者

Li, Yudong, Lei, Yunlin, Yang, Xu

论文摘要

在过去的几年中，尖峰神经网络（SNN）在性能和效率上取得了长足的进步，但是它们独特的工作模式很难训练高性能的低延迟snn。（即CNN）及其性能比Ann的表现更糟糕，它限制了SNN的应用。为此，我们提出了一种基于“ SpikeFormer”的新型基于变压器的新型SNN，它在静态数据集和神经形态数据集上都超出了其在静态数据集上的表现，并且可能是培训CNN的替代架构，以与CONN NORMENTINCE进行培训，以交换SNN，以交换SNN的数据。 hungry" and the unstable training period exhibited in the vanilla model,we design the Convolutional Tokenizer (CT) module,which improves the accuracy of the original model on DVS-Gesture by more than 16%.Besides,in order to better incorporate the attention mechanism inside Transformer and the spatio-temporal information inherent to SNN,we adopt spatio-temporal attention (STA) instead of spatial-wise or temporal-wise注意我们提出的方法，我们在DVS-CIFAR10，DVS-GETURE和IMAGENET数据集上实现了竞争性或最新（SOTA）SNN性能，并具有最小的模拟时间步骤（即潜伏期）（即延迟）。可以确切地说，我们的Spikeer Former在Imainet上超过了5％的ENTER（即超过5％）。与CNN相比，与CNN相比，DVS手机和ImageNet分别为2.2％，表明SpikeFormer是训练大规模SNN的有希望的结构，并且可能更适合SNN。我们相信，这项工作应尽可能多地与ANN的SNN一起发展，并尽可能多地提供ANN。

Spiking neural networks (SNNs) have made great progress on both performance and efficiency over the last few years,but their unique working pattern makes it hard to train a high-performance low-latency SNN.Thus the development of SNNs still lags behind traditional artificial neural networks (ANNs).To compensate this gap,many extraordinary works have been proposed.Nevertheless,these works are mainly based on the same kind of network structure (i.e.CNN) and their performance is worse than their ANN counterparts,which limits the applications of SNNs.To this end,we propose a novel Transformer-based SNN,termed "Spikeformer",which outperforms its ANN counterpart on both static dataset and neuromorphic dataset and may be an alternative architecture to CNN for training high-performance SNNs.First,to deal with the problem of "data hungry" and the unstable training period exhibited in the vanilla model,we design the Convolutional Tokenizer (CT) module,which improves the accuracy of the original model on DVS-Gesture by more than 16%.Besides,in order to better incorporate the attention mechanism inside Transformer and the spatio-temporal information inherent to SNN,we adopt spatio-temporal attention (STA) instead of spatial-wise or temporal-wise attention.With our proposed method,we achieve competitive or state-of-the-art (SOTA) SNN performance on DVS-CIFAR10,DVS-Gesture,and ImageNet datasets with the least simulation time steps (i.e.low latency).Remarkably,our Spikeformer outperforms other SNNs on ImageNet by a large margin (i.e.more than 5%) and even outperforms its ANN counterpart by 3.1% and 2.2% on DVS-Gesture and ImageNet respectively,indicating that Spikeformer is a promising architecture for training large-scale SNNs and may be more suitable for SNNs compared to CNN.We believe that this work shall keep the development of SNNs in step with ANNs as much as possible.Code will be available.

下载PDF全文

下载文献需遵守相关版权规定

论文标题