基于空间意识和语义感知令牌对准的跨域检测变压器

论文标题

基于空间意识和语义感知令牌对准的跨域检测变压器

Cross-domain Detection Transformer based on Spatial-aware and Semantic-aware Token Alignment

论文作者

Deng, Jinhong, Zhang, Xiaoyue, Li, Wen, Duan, Lixin

论文摘要

检测变压器（例如DETR）最近在许多对象检测任务上表现出了有希望的性能，但是对于跨域适应方案，这些方法的概括能力仍然很具有挑战性。为了解决跨域问题，一种直接的方法是在变形金刚中对对抗训练进行令牌对齐。但是，由于检测变压器中的令牌非常多样，并且代表了不同的空间和语义信息，因此其性能通常不令人满意。在本文中，我们提出了一种用于跨域检测变压器的新方法，称为“空间感知和语义感知”令牌比对（SSTA）。特别是，我们利用了检测变压器中使用的交叉注意力的特征，并提出了空间感知的令牌对准（SPATA）和语义吸引的令牌比对（SEMTA）策略，以指导跨域之间的令牌对齐。对于空间感知的令牌对齐，我们可以从跨意图图（CAM）中提取信息，以根据对象查询的关注来对齐令牌的分布。对于语义意识的令牌对齐，我们将类别信息注入跨注意图和构造域嵌入中，以指导学习多类判别器，以对类别关系进行建模并实现整个适应过程中类别级别的标记对齐。我们对几个广泛使用的基准进行了广泛的实验，结果清楚地表明了我们提出的方法对现有最新基线的有效性。

Detection transformers like DETR have recently shown promising performance on many object detection tasks, but the generalization ability of those methods is still quite challenging for cross-domain adaptation scenarios. To address the cross-domain issue, a straightforward way is to perform token alignment with adversarial training in transformers. However, its performance is often unsatisfactory as the tokens in detection transformers are quite diverse and represent different spatial and semantic information. In this paper, we propose a new method called Spatial-aware and Semantic-aware Token Alignment (SSTA) for cross-domain detection transformers. In particular, we take advantage of the characteristics of cross-attention as used in detection transformer and propose the spatial-aware token alignment (SpaTA) and the semantic-aware token alignment (SemTA) strategies to guide the token alignment across domains. For spatial-aware token alignment, we can extract the information from the cross-attention map (CAM) to align the distribution of tokens according to their attention to object queries. For semantic-aware token alignment, we inject the category information into the cross-attention map and construct domain embedding to guide the learning of a multi-class discriminator so as to model the category relationship and achieve category-level token alignment during the entire adaptation process. We conduct extensive experiments on several widely-used benchmarks, and the results clearly show the effectiveness of our proposed method over existing state-of-the-art baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题