一项关于预处理预处理对象检测的研究

论文标题

一项关于预处理预处理对象检测的研究

A Study on Self-Supervised Object Detection Pretraining

论文作者

Dang, Trung, Kornblith, Simon, Nguyen, Huy Thong, Chin, Peter, Khademi, Maryam

论文摘要

在这项工作中，我们研究了对象检测模型的自我监管预审计的不同方法。我们首先设计一个通用框架，通过随机采样和投射框来学习一个从图像中学习空间一致的密集表示，并将其投影到每个增强视图，并最大程度地提高相应的盒子功能之间的相似性。我们研究文献中的现有设计选择，例如盒子生成，功能提取策略，并使用其成功启发的多种视图，该视图受到实例级图像表示技术的成功。我们的结果表明，该方法对超参数的不同选择是可靠的，并且使用多个视图不如实例级图像表示学习所显示的那样有效。我们还设计了两个辅助任务，以通过（1）使用对比度损失从另一个视图中的特征中从其特征中预测框，并通过使用对比度损失来预测框，（2）使用变压器预测盒子坐标，这可能会使下游对象检测任务有益于下游对象检测任务。我们发现，在标记数据上预处理模型时，这些任务不会导致更好的对象检测性能。

In this work, we study different approaches to self-supervised pretraining of object detection models. We first design a general framework to learn a spatially consistent dense representation from an image, by randomly sampling and projecting boxes to each augmented view and maximizing the similarity between corresponding box features. We study existing design choices in the literature, such as box generation, feature extraction strategies, and using multiple views inspired by its success on instance-level image representation learning techniques. Our results suggest that the method is robust to different choices of hyperparameters, and using multiple views is not as effective as shown for instance-level image representation learning. We also design two auxiliary tasks to predict boxes in one view from their features in the other view, by (1) predicting boxes from the sampled set by using a contrastive loss, and (2) predicting box coordinates using a transformer, which potentially benefits downstream object detection tasks. We found that these tasks do not lead to better object detection performance when finetuning the pretrained model on labeled data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题