论文标题

任务集成网络:图像搜索的联合检测和检索

Tasks Integrated Networks: Joint Detection and Retrieval for Image Search

论文作者

Zhang, Lei, He, Zhenwei, Yang, Yi, Wang, Liang, Gao, Xinbo

论文摘要

传统的对象检索任务旨在学习具有相似性和间隔性的歧视性特征表示,这假定图像中的对象是手动或自动预先编写的。但是,在许多现实世界中的搜索场景(例如,视频监视)中,很少能够准确检测或注释对象(例如,人,车辆等)。因此,在没有边界框注释的情况下,对象级检索变得棘手,这导致了一个新的但具有挑战性的主题,即图像级搜索。在本文中,为了解决映像搜索问题,我们首先引入了一个端到端集成网(I-NET),该网络具有三个优点:1)在给定图像中设计了针对相似和不同对象的暹罗体系结构和在线配对策略。 2)通过动态特征词典引入了一种新颖的在线配对(OLP)损失,该词典通过自动产生许多负对来限制阳性,从而减轻了多任务训练停滞问题。 3)提出了一个基于软的示例优先级(HEP)损失,以通过选择硬类别来提高分类任务的鲁棒性。借助分歧和征服的理念,我们进一步提出了一个改进的I-NET,称为DC-I-NET,这做出了两个新的贡献:1)量身定制两个模块,以在集成框架中分别处理不同的任务,以保证任务规范。 2)提出了通过利用存储的集体中心来进行集体中心的HEP损失(C2HEP),以便可以捕获相似性和间隔性以进行最终检索。在著名的图像级搜索基准数据集上进行的广泛实验表明,所提出的DC-I-NET优于最新的任务集成和任务分隔的图像搜索模型。

The traditional object retrieval task aims to learn a discriminative feature representation with intra-similarity and inter-dissimilarity, which supposes that the objects in an image are manually or automatically pre-cropped exactly. However, in many real-world searching scenarios (e.g., video surveillance), the objects (e.g., persons, vehicles, etc.) are seldom accurately detected or annotated. Therefore, object-level retrieval becomes intractable without bounding-box annotation, which leads to a new but challenging topic, i.e. image-level search. In this paper, to address the image search issue, we first introduce an end-to-end Integrated Net (I-Net), which has three merits: 1) A Siamese architecture and an on-line pairing strategy for similar and dissimilar objects in the given images are designed. 2) A novel on-line pairing (OLP) loss is introduced with a dynamic feature dictionary, which alleviates the multi-task training stagnation problem, by automatically generating a number of negative pairs to restrict the positives. 3) A hard example priority (HEP) based softmax loss is proposed to improve the robustness of classification task by selecting hard categories. With the philosophy of divide and conquer, we further propose an improved I-Net, called DC-I-Net, which makes two new contributions: 1) two modules are tailored to handle different tasks separately in the integrated framework, such that the task specification is guaranteed. 2) A class-center guided HEP loss (C2HEP) by exploiting the stored class centers is proposed, such that the intra-similarity and inter-dissimilarity can be captured for ultimate retrieval. Extensive experiments on famous image-level search oriented benchmark datasets demonstrate that the proposed DC-I-Net outperforms the state-of-the-art tasks-integrated and tasks-separated image search models.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源