多标签视觉分析及其他

论文标题

多标签视觉分析及其他

Self-Supervised Pyramid Representation Learning for Multi-Label Visual Analysis and Beyond

论文作者

Hsieh, Cheng-Yen, Chang, Chih-Jung, Yang, Fu-En, Wang, Yu-Chiang Frank

论文摘要

尽管已证明自我监督的学习受益于许多视觉任务，但现有技术主要集中于图像级的操作，这可能无法很好地概括为补丁或像素级别的下游任务。此外，现有的SSL方法可能无法充分描述和关联图像量表内和跨图像量表的上述表示。在本文中，我们提出了一个自制的金字塔表示学习（SS-PRL）框架。所提出的SS-PRL旨在通过学习适当的原型在斑块水平上得出金字塔表示，并在图像中观察和联系固有的语义信息。特别是，我们在SS-PRL中提出了跨尺度贴片级的相关性学习，该学习允许模型汇总和关联信息跨越斑块量表。我们表明，借助我们提出的用于模型预训练的SS-PRL，可以轻松适应和调整模型的各种应用程序，包括多标签分类，对象检测和实例分割。

While self-supervised learning has been shown to benefit a number of vision tasks, existing techniques mainly focus on image-level manipulation, which may not generalize well to downstream tasks at patch or pixel levels. Moreover, existing SSL methods might not sufficiently describe and associate the above representations within and across image scales. In this paper, we propose a Self-Supervised Pyramid Representation Learning (SS-PRL) framework. The proposed SS-PRL is designed to derive pyramid representations at patch levels via learning proper prototypes, with additional learners to observe and relate inherent semantic information within an image. In particular, we present a cross-scale patch-level correlation learning in SS-PRL, which allows the model to aggregate and associate information learned across patch scales. We show that, with our proposed SS-PRL for model pre-training, one can easily adapt and fine-tune the models for a variety of applications including multi-label classification, object detection, and instance segmentation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题