迈向基于DITR的人类对象相互作用检测的硬阳性查询采矿

论文标题

迈向基于DITR的人类对象相互作用检测的硬阳性查询采矿

Towards Hard-Positive Query Mining for DETR-based Human-Object Interaction Detection

论文作者

Zhong, Xubin, Ding, Changxing, Li, Zijian, Huang, Shaoli

论文摘要

人类对象相互作用（HOI）检测是高级图像理解的核心任务。最近，由于其出色的性能和有效的结构，检测变压器（DETR）基于HOI的检测器已变得流行。但是，这些方法通常对所有测试图像采用固定的HOI查询，这很容易受到一个特定图像中对象的位置更改的影响。因此，在本文中，我们建议通过挖掘硬阳性查询来增强DETR的鲁棒性，这些查询被迫使用部分视觉提示做出正确的预测。首先，我们根据每个训练图像标记的人类对象对的地面（GT）位置明确组成硬阳性查询。具体而言，我们将每个标记的人类对象对的GT边界框移动，以使移位框仅覆盖GT的一定部分。我们将每个标记的人类对象对的移位框的坐标编码为HOI查询。其次，我们通过在解码器层的交叉注意地图中掩盖了最高分数，从而隐式构建了另一组硬阳性查询。然后，掩盖的注意图仅涵盖HOI预测的部分重要提示。最后，提出了一种替代策略，该策略有效地结合了两种类型的硬性查询。在每次迭代中，都采用了Detr的可学习查询和一种选择的硬阳性查询进行损失计算。实验结果表明，我们提出的方法可以广泛应用于现有的基于DITR的HOI探测器。此外，我们始终在三个基准上实现最先进的性能：HICO-DET，V-COCO和HOI-A。代码可在https://github.com/muchhair/hqm上找到。

Human-Object Interaction (HOI) detection is a core task for high-level image understanding. Recently, Detection Transformer (DETR)-based HOI detectors have become popular due to their superior performance and efficient structure. However, these approaches typically adopt fixed HOI queries for all testing images, which is vulnerable to the location change of objects in one specific image. Accordingly, in this paper, we propose to enhance DETR's robustness by mining hard-positive queries, which are forced to make correct predictions using partial visual cues. First, we explicitly compose hard-positive queries according to the ground-truth (GT) position of labeled human-object pairs for each training image. Specifically, we shift the GT bounding boxes of each labeled human-object pair so that the shifted boxes cover only a certain portion of the GT ones. We encode the coordinates of the shifted boxes for each labeled human-object pair into an HOI query. Second, we implicitly construct another set of hard-positive queries by masking the top scores in cross-attention maps of the decoder layers. The masked attention maps then only cover partial important cues for HOI predictions. Finally, an alternate strategy is proposed that efficiently combines both types of hard queries. In each iteration, both DETR's learnable queries and one selected type of hard-positive queries are adopted for loss computation. Experimental results show that our proposed approach can be widely applied to existing DETR-based HOI detectors. Moreover, we consistently achieve state-of-the-art performance on three benchmarks: HICO-DET, V-COCO, and HOI-A. Code is available at https://github.com/MuchHair/HQM.

下载PDF全文

下载文献需遵守相关版权规定

论文标题