从外主图像中学习负担得起的基础

论文标题

从外主图像中学习负担得起的基础

Learning Affordance Grounding from Exocentric Images

论文作者

Luo, Hongchen, Zhai, Wei, Zhang, Jing, Cao, Yang, Tao, Dacheng

论文摘要

负担得起的基础，在物体中进行（即本地化）行动可能区域的任务，这面临着由于交互式负担的多样性而面临与对象部分建立明确联系的挑战。人的能力可以改变各种外向互动以不变的自负负担，以应对互动多样性的影响。为了使代理具有这种能力，本文提出了一项负担得起的任务，即从外主观点（即给定的中心化的人类对象的互动和以egpocentric对象图像），了解对象的负担性知识并将其传输到中心图像中，仅使用负担得起的标签作为监督。为此，我们设计了一个跨视图知识转移框架，该框架从外界互动中提取特定于负担的特定特征，并通过保留负担相关性来增强对负担区域的感知。具体而言，通过最大程度地减少了源自外心图像中的相互作用习惯来最大程度地减少类内部差异来最大程度地减少类内部差异来提取特定的线索。此外，提出了负担得起的共同关系保存策略，以通过对齐两种观点之间的预测结果的共同关系矩阵来感知和定位负担。特别是，一个名为AGD20K的负担得起的基础数据集是通过收集和标记36个负担能力类别的20k图像来构建的。实验结果表明，我们的方法在客观指标和视觉质量方面优于代表性模型。代码：github.com/lhc1224/cross-view-ag。

Affordance grounding, a task to ground (i.e., localize) action possibility region in objects, which faces the challenge of establishing an explicit link with object parts due to the diversity of interactive affordance. Human has the ability that transform the various exocentric interactions to invariant egocentric affordance so as to counter the impact of interactive diversity. To empower an agent with such ability, this paper proposes a task of affordance grounding from exocentric view, i.e., given exocentric human-object interaction and egocentric object images, learning the affordance knowledge of the object and transferring it to the egocentric image using only the affordance label as supervision. To this end, we devise a cross-view knowledge transfer framework that extracts affordance-specific features from exocentric interactions and enhances the perception of affordance regions by preserving affordance correlation. Specifically, an Affordance Invariance Mining module is devised to extract specific clues by minimizing the intra-class differences originated from interaction habits in exocentric images. Besides, an Affordance Co-relation Preserving strategy is presented to perceive and localize affordance by aligning the co-relation matrix of predicted results between the two views. Particularly, an affordance grounding dataset named AGD20K is constructed by collecting and labeling over 20K images from 36 affordance categories. Experimental results demonstrate that our method outperforms the representative models in terms of objective metrics and visual quality. Code: github.com/lhc1224/Cross-View-AG.

下载PDF全文

下载文献需遵守相关版权规定

论文标题