论文标题
外围视觉变压器
Peripheral Vision Transformer
论文作者
论文摘要
人类视觉具有一种特殊类型的视觉处理系统,称为外围视觉。外围视觉将整个视野划分为多个轮廓区域,使我们能够在不同地区感知各种视觉特征。在这项工作中,我们采用了一种以生物学启发的方法,并探索以建模深层神经网络中的外围视觉以进行视觉识别。我们建议将编码编码的外围位置纳入多头自发层,以使网络学会将视野分配到给定培训数据的各种外围区域。我们在Imagenet-1K上评估了所提出的网络,并以Imagenet-1k的形式进行了评估,并系统地研究了机器感知模型的内部工作原理,这表明网络学会了与人类视力相似的视觉数据。图像分类对基线对不同模型尺寸的性能提高证明了该方法的功效。
Human vision possesses a special type of visual processing systems called peripheral vision. Partitioning the entire visual field into multiple contour regions based on the distance to the center of our gaze, the peripheral vision provides us the ability to perceive various visual features at different regions. In this work, we take a biologically inspired approach and explore to model peripheral vision in deep neural networks for visual recognition. We propose to incorporate peripheral position encoding to the multi-head self-attention layers to let the network learn to partition the visual field into diverse peripheral regions given training data. We evaluate the proposed network, dubbed PerViT, on ImageNet-1K and systematically investigate the inner workings of the model for machine perception, showing that the network learns to perceive visual data similarly to the way that human vision does. The performance improvements in image classification over the baselines across different model sizes demonstrate the efficacy of the proposed method.