论文标题
媒介:预测野外单个物体的概率相对旋转
RelPose: Predicting Probabilistic Relative Rotation for Single Objects in the Wild
论文作者
论文摘要
我们描述了一种数据驱动的方法,用于指定任意对象的多个图像,以推断相机观点。该任务是经典几何管道(例如SFM和SLAM)的核心组成部分,也是当代神经方法(例如NERF)的至关重要的预处理要求,以对象重建和查看合成。与现有的对应驱动的方法相反,鉴于稀疏视图的表现不佳,我们提出了一种基于自上而下的预测方法来估计相机观点。我们的关键技术见解是使用基于能量的公式来表示相对摄像机旋转的分布,从而使我们能够明确表示由对象对称或视图引起的多个摄像机模式。利用这些相对预测,我们共同估计了来自多个图像的一致摄像机旋转。我们表明,我们的方法的表现优于最先进的SFM和SLAM方法,并且在可见和看不见的类别上都稀疏图像。此外,我们的概率方法显着优于直接回归相对姿势的表现,这表明对多模型建模对于相干关节重建很重要。我们证明,我们的系统可以成为多视图数据集中野外重建的垫脚石。可以在https://jasonyzhang.com/relpose上找到带有代码和视频的项目页面。
We describe a data-driven method for inferring the camera viewpoints given multiple images of an arbitrary object. This task is a core component of classic geometric pipelines such as SfM and SLAM, and also serves as a vital pre-processing requirement for contemporary neural approaches (e.g. NeRF) to object reconstruction and view synthesis. In contrast to existing correspondence-driven methods that do not perform well given sparse views, we propose a top-down prediction based approach for estimating camera viewpoints. Our key technical insight is the use of an energy-based formulation for representing distributions over relative camera rotations, thus allowing us to explicitly represent multiple camera modes arising from object symmetries or views. Leveraging these relative predictions, we jointly estimate a consistent set of camera rotations from multiple images. We show that our approach outperforms state-of-the-art SfM and SLAM methods given sparse images on both seen and unseen categories. Further, our probabilistic approach significantly outperforms directly regressing relative poses, suggesting that modeling multimodality is important for coherent joint reconstruction. We demonstrate that our system can be a stepping stone toward in-the-wild reconstruction from multi-view datasets. The project page with code and videos can be found at https://jasonyzhang.com/relpose.