论文标题

用于开放式视频对象检测的细粒度视觉文本及时驱动的自我训练

Fine-grained Visual-Text Prompt-Driven Self-Training for Open-Vocabulary Object Detection

论文作者

Long, Yanxin, Han, Jianhua, Huang, Runhui, Hang, Xu, Zhu, Yi, Xu, Chunjing, Liang, Xiaodan

论文摘要

受视力语言方法(VLM)在零摄影分类中的成功启发,最近的作品试图通过利用预训练的VLM的本地化能力并以自我训练的方式利用预先训练的VLM的本地化能力并为未见类别生成伪标签,从而将这一工作范围扩展到对象检测中。但是,由于当前的VLM通常是通过嵌入与全局图像嵌入的对齐句子进行预训练的,因此直接使用它们的对象实例缺乏细粒度对齐,这是检测的核心。在本文中,我们提出了一个简单但有效的良好的视觉文本及时迅速驱动的自我训练范式,用于开放式摄影检测(VTP-ovd),该范例引入了一个更强大的细粒度的良好的良好的良好的良好的对准,从而引入了一个细粒度的视觉启动及时及时及时的及时迅速调整阶段,以增强当前的自我训练范式。在适应阶段,我们可以通过使用可学习的文本提示来解决辅助密集像素的预测任务来获得细粒度的对齐。此外,我们提出了一个视觉提示模块,以提供视觉分支的先前任务信息(即需要预测类别),以更好地使预训练的VLM适应下游任务。实验表明,我们的方法实现了开放式摄物对象检测的最新性能,例如,在看不见的可可类中,31.5%的地图。

Inspired by the success of vision-language methods (VLMs) in zero-shot classification, recent works attempt to extend this line of work into object detection by leveraging the localization ability of pre-trained VLMs and generating pseudo labels for unseen classes in a self-training manner. However, since the current VLMs are usually pre-trained with aligning sentence embedding with global image embedding, the direct use of them lacks fine-grained alignment for object instances, which is the core of detection. In this paper, we propose a simple but effective fine-grained Visual-Text Prompt-driven self-training paradigm for Open-Vocabulary Detection (VTP-OVD) that introduces a fine-grained visual-text prompt adapting stage to enhance the current self-training paradigm with a more powerful fine-grained alignment. During the adapting stage, we enable VLM to obtain fine-grained alignment by using learnable text prompts to resolve an auxiliary dense pixel-wise prediction task. Furthermore, we propose a visual prompt module to provide the prior task information (i.e., the categories need to be predicted) for the vision branch to better adapt the pre-trained VLM to the downstream tasks. Experiments show that our method achieves the state-of-the-art performance for open-vocabulary object detection, e.g., 31.5% mAP on unseen classes of COCO.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源