单一图像的相互作用的注意图

论文标题

单一图像的相互作用的注意图

Interacting Attention Graph for Single Image Two-Hand Reconstruction

论文作者

Li, Mengcheng, An, Liang, Zhang, Hongwen, Wu, Lianpeng, Chen, Feng, Yu, Tao, Liu, Yebin

论文摘要

Graph卷积网络（GCN）在单手重建任务中取得了巨大的成功，而通过GCN进行双手重建仍然没有探索。在本文中，我们提出了相互作用的注意力图手（Intaghand），这是第一个基于图形卷积的网络，该网络从单个RGB图像中重建了两个相互作用的手。为了解决双手重建的遮挡和相互作用挑战，我们在原始GCN的每个UPPLINGS采样步骤中介绍了两个新型的基于注意力的模块。第一个模块是金字塔图像特征注意（PIFA）模块，该模块利用多分辨率特征隐式获得顶点到图像对齐。第二个模块是交叉注意（CHA）模块，该模块通过在两个手顶点之间构建密集的交叉注意来编码相互作用的手的连贯性。结果，我们的模型优于所有现有的双手重建方法，以大量的额外差距为2600万基准。此外，消融研究验证了PIFA和CHA模块在提高重建精度方面的有效性。野外图像和实时视频流的结果进一步证明了我们的网络的概括能力。我们的代码可在https://github.com/dw1010/intaghand上找到。

Graph convolutional network (GCN) has achieved great success in single hand reconstruction task, while interacting two-hand reconstruction by GCN remains unexplored. In this paper, we present Interacting Attention Graph Hand (IntagHand), the first graph convolution based network that reconstructs two interacting hands from a single RGB image. To solve occlusion and interaction challenges of two-hand reconstruction, we introduce two novel attention based modules in each upsampling step of the original GCN. The first module is the pyramid image feature attention (PIFA) module, which utilizes multiresolution features to implicitly obtain vertex-to-image alignment. The second module is the cross hand attention (CHA) module that encodes the coherence of interacting hands by building dense cross-attention between two hand vertices. As a result, our model outperforms all existing two-hand reconstruction methods by a large margin on InterHand2.6M benchmark. Moreover, ablation studies verify the effectiveness of both PIFA and CHA modules for improving the reconstruction accuracy. Results on in-the-wild images and live video streams further demonstrate the generalization ability of our network. Our code is available at https://github.com/Dw1010/IntagHand.

下载PDF全文

下载文献需遵守相关版权规定

论文标题