端到端人搜索的顺序变压器

论文标题

端到端人搜索的顺序变压器

Sequential Transformer for End-to-End Person Search

论文作者

Chen, Long, Xu, Jinhua

论文摘要

人搜索旨在同时定位并认识到现实和无效画廊图像中的目标人。人搜索的一个主要挑战来自两个子任务的矛盾目标，即，人的检测重点是寻找所有人的共同点，以将人与背景区分开来，而人重新认同（RE-ID）着重于不同人之间的差异。在本文中，我们提出了一个新颖的顺序变压器（SEQTR），以供端到端的人搜索来应对这一挑战。我们的SEQTR包含一个检测变压器和一个新型的重新ID变压器，该变压器依次解决检测和重新ID任务。 Re-ID变压器包括使用上下文信息的自我发项层和学习人体局部细粒度歧视特征的跨注意层。此外，Re-ID变压器由多尺度功能共享和监督，以改善学习人的表现的稳健性。对两个广泛使用的人搜索基准（Cuhk-Sysu and Prw）进行了广泛的实验表明，我们提议的SEQTR不仅优于所有现有的人搜索方法，其prw上的地图为59.3％，而且在cuhk-sysu上的地图为94.8％。

Person Search aims to simultaneously localize and recognize a target person from realistic and uncropped gallery images. One major challenge of person search comes from the contradictory goals of the two sub-tasks, i.e., person detection focuses on finding the commonness of all persons so as to distinguish persons from the background, while person re-identification (re-ID) focuses on the differences among different persons. In this paper, we propose a novel Sequential Transformer (SeqTR) for end-to-end person search to deal with this challenge. Our SeqTR contains a detection transformer and a novel re-ID transformer that sequentially addresses detection and re-ID tasks. The re-ID transformer comprises the self-attention layer that utilizes contextual information and the cross-attention layer that learns local fine-grained discriminative features of the human body. Moreover, the re-ID transformer is shared and supervised by multi-scale features to improve the robustness of learned person representations. Extensive experiments on two widely-used person search benchmarks, CUHK-SYSU and PRW, show that our proposed SeqTR not only outperforms all existing person search methods with a 59.3% mAP on PRW but also achieves comparable performance to the state-of-the-art results with an mAP of 94.8% on CUHK-SYSU.

下载PDF全文

下载文献需遵守相关版权规定

论文标题