EHSOD：CAM引导的端到端混合监督对象检测级联细化

论文标题

EHSOD：CAM引导的端到端混合监督对象检测级联细化

EHSOD: CAM-Guided End-to-end Hybrid-Supervised Object Detection with Cascade Refinement

论文作者

Fang, Linpu, Xu, Hang, Liu, Zhili, Parisot, Sarah, Li, Zhenguo

论文摘要

对完全注重数据培训的对象探测器当前会产生最先进的性能，但需要昂贵的手动注释。另一方面，弱监督的检测器的性能要低得多，在现实的环境中不能可靠地使用。在本文中，我们研究了混合监管的对象检测问题，旨在训练只有数量饱满的数据的高质量检测器，并使用Imagelevel标签完全利用廉价数据。最先进的方法通常提出一种迭代方法，在生成伪标签和更新检测器之间交替。此范式需要仔细的手动超级参数调整，以在每轮上挖掘良好的伪标签，并且非常耗时。为了解决这些问题，我们提出了EHSOD，这是一种端到端的混合监督对象检测系统，可以在完全和弱注册的数据上进行一次训练。具体而言，基于两个阶段探测器，我们提出了两个模块，以完全利用两种标签中的信息：1）CAMRPN模块旨在寻找以类激活热图为指导的前景建议； 2）借助与图像级数据兼容的辅助头部，混合监督的级联模块进一步完善了边界框的位置和分类。广泛的实验证明了该方法的有效性，并且在仅有30％的全面通量数据（例如可可的37.5％地图。我们将发布代码和训练有素的模型。

Object detectors trained on fully-annotated data currently yield state of the art performance but require expensive manual annotations. On the other hand, weakly-supervised detectors have much lower performance and cannot be used reliably in a realistic setting. In this paper, we study the hybrid-supervised object detection problem, aiming to train a high quality detector with only a limited amount of fullyannotated data and fully exploiting cheap data with imagelevel labels. State of the art methods typically propose an iterative approach, alternating between generating pseudo-labels and updating a detector. This paradigm requires careful manual hyper-parameter tuning for mining good pseudo labels at each round and is quite time-consuming. To address these issues, we present EHSOD, an end-to-end hybrid-supervised object detection system which can be trained in one shot on both fully and weakly-annotated data. Specifically, based on a two-stage detector, we proposed two modules to fully utilize the information from both kinds of labels: 1) CAMRPN module aims at finding foreground proposals guided by a class activation heat-map; 2) hybrid-supervised cascade module further refines the bounding-box position and classification with the help of an auxiliary head compatible with image-level data. Extensive experiments demonstrate the effectiveness of the proposed method and it achieves comparable results on multiple object detection benchmarks with only 30% fully-annotated data, e.g. 37.5% mAP on COCO. We will release the code and the trained models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题