论文标题

EHSOD:CAM引导的端到端混合监督对象检测级联细化

EHSOD: CAM-Guided End-to-end Hybrid-Supervised Object Detection with Cascade Refinement

论文作者

Fang, Linpu, Xu, Hang, Liu, Zhili, Parisot, Sarah, Li, Zhenguo

论文摘要

对完全注重数据培训的对象探测器当前会产生最先进的性能,但需要昂贵的手动注释。另一方面,弱监督的检测器的性能要低得多,在现实的环境中不能可靠地使用。在本文中,我们研究了混合监管的对象检测问题,旨在训练只有数量饱满的数据的高质量检测器,并使用Imagelevel标签完全利用廉价数据。最先进的方法通常提出一种迭代方法,在生成伪标签和更新检测器之间交替。此范式需要仔细的手动超级参数调整,以在每轮上挖掘良好的伪标签,并且非常耗时。为了解决这些问题,我们提出了EHSOD,这是一种端到端的混合监督对象检测系统,可以在完全和弱注册的数据上进行一次训练。具体而言,基于两个阶段探测器,我们提出了两个模块,以完全利用两种标签中的信息:1)CAMRPN模块旨在寻找以类激活热图为指导的前景建议; 2)借助与图像级数据兼容的辅助头部,混合监督的级联模块进一步完善了边界框的位置和分类。广泛的实验证明了该方法的有效性,并且在仅有30%的全面通量数据(例如可可的37.5%地图。我们将发布代码和训练有素的模型。

Object detectors trained on fully-annotated data currently yield state of the art performance but require expensive manual annotations. On the other hand, weakly-supervised detectors have much lower performance and cannot be used reliably in a realistic setting. In this paper, we study the hybrid-supervised object detection problem, aiming to train a high quality detector with only a limited amount of fullyannotated data and fully exploiting cheap data with imagelevel labels. State of the art methods typically propose an iterative approach, alternating between generating pseudo-labels and updating a detector. This paradigm requires careful manual hyper-parameter tuning for mining good pseudo labels at each round and is quite time-consuming. To address these issues, we present EHSOD, an end-to-end hybrid-supervised object detection system which can be trained in one shot on both fully and weakly-annotated data. Specifically, based on a two-stage detector, we proposed two modules to fully utilize the information from both kinds of labels: 1) CAMRPN module aims at finding foreground proposals guided by a class activation heat-map; 2) hybrid-supervised cascade module further refines the bounding-box position and classification with the help of an auxiliary head compatible with image-level data. Extensive experiments demonstrate the effectiveness of the proposed method and it achieves comparable results on multiple object detection benchmarks with only 30% fully-annotated data, e.g. 37.5% mAP on COCO. We will release the code and the trained models.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源