论文标题
来自单个图像具有跨分支特征融合的联合手动3D重建
Joint Hand-object 3D Reconstruction from a Single Image with Cross-branch Feature Fusion
论文作者
论文摘要
手动图像的手和物体形状的准确3D重建对于理解人类对象的相互作用以及人类日常活动很重要。与裸手姿势估计不同,手对象交互对手和操纵对象都构成了强大的限制,这表明手动配置可能是对象的至关重要的上下文信息,反之亦然。但是,当前方法通过训练两个分支机构网络来解决此任务,以分别重建手和对象,而两个分支之间的通信很少。在这项工作中,我们建议在特征空间中共同考虑手和对象,并探索两个分支的互惠。我们用MLP或LSTM单元广泛研究了跨分支特征融合体系结构。在调查的体系结构中,具有LSTM单元的变体可以通过手动功能增强对象功能,显示出最佳的性能增长。此外,我们采用辅助深度估计模块使用估计的深度图来增强输入RGB图像,从而进一步提高了重建精度。在公共数据集上进行的实验表明,我们的方法在对象的重建精度方面显着优于现有方法。
Accurate 3D reconstruction of the hand and object shape from a hand-object image is important for understanding human-object interaction as well as human daily activities. Different from bare hand pose estimation, hand-object interaction poses a strong constraint on both the hand and its manipulated object, which suggests that hand configuration may be crucial contextual information for the object, and vice versa. However, current approaches address this task by training a two-branch network to reconstruct the hand and object separately with little communication between the two branches. In this work, we propose to consider hand and object jointly in feature space and explore the reciprocity of the two branches. We extensively investigate cross-branch feature fusion architectures with MLP or LSTM units. Among the investigated architectures, a variant with LSTM units that enhances object feature with hand feature shows the best performance gain. Moreover, we employ an auxiliary depth estimation module to augment the input RGB image with the estimated depth map, which further improves the reconstruction accuracy. Experiments conducted on public datasets demonstrate that our approach significantly outperforms existing approaches in terms of the reconstruction accuracy of objects.