论文标题
batchformerv2:探索密集表示学习的样本关系
BatchFormerV2: Exploring Sample Relationships for Dense Representation Learning
论文作者
论文摘要
注意机制在深层神经网络中非常流行,在深层的神经网络中,变形金刚在自然语言处理中还取得了巨大成功,而且在视觉识别应用中取得了巨大的成功。最近,已经引入了一个新的变压器模块,该模块应用于批处理维度,而不是空间/通道维度,即批量信息[18],以探索样本关系,以克服数据稀缺性挑战。但是,它仅适用于图像级表示形式进行分类。在本文中,我们设计了一个更通用的批处理变压器模块BatchFormerv2,该模块进一步探索了示例关系以进行密集表示学习。具体而言,在应用提出的模块时,它在训练过程中使用两条管道,即有或没有batchformerv2模块,可以在其中删除批处理流以进行测试。因此,所提出的方法是一个插件模块,可以轻松地集成到不同的视觉变压器中,而无需任何额外的推理成本。没有铃铛和哨子,我们显示了所提出的方法在各种流行的视觉识别任务中的有效性,包括图像分类和两个重要的密集预测任务:对象检测和全盘分段。特别是,BatchFormerv2始终将基于DETR的检测方法(例如Detr,可变形 - detr,条件DETR和SMCA)提高了1.3%以上。代码将公开可用。
Attention mechanisms have been very popular in deep neural networks, where the Transformer architecture has achieved great success in not only natural language processing but also visual recognition applications. Recently, a new Transformer module, applying on batch dimension rather than spatial/channel dimension, i.e., BatchFormer [18], has been introduced to explore sample relationships for overcoming data scarcity challenges. However, it only works with image-level representations for classification. In this paper, we devise a more general batch Transformer module, BatchFormerV2, which further enables exploring sample relationships for dense representation learning. Specifically, when applying the proposed module, it employs a two-stream pipeline during training, i.e., either with or without a BatchFormerV2 module, where the batchformer stream can be removed for testing. Therefore, the proposed method is a plug-and-play module and can be easily integrated into different vision Transformers without any extra inference cost. Without bells and whistles, we show the effectiveness of the proposed method for a variety of popular visual recognition tasks, including image classification and two important dense prediction tasks: object detection and panoptic segmentation. Particularly, BatchFormerV2 consistently improves current DETR-based detection methods (e.g., DETR, Deformable-DETR, Conditional DETR, and SMCA) by over 1.3%. Code will be made publicly available.