论文标题

Posegan:相机本地化的姿势到图像翻译框架

PoseGAN: A Pose-to-Image Translation Framework for Camera Localization

论文作者

Liu, Kanglin, Li, Qing, Qiu, Guoping

论文摘要

相机定位是机器人技术和计算机视觉的基本要求。本文介绍了一个姿势到图像翻译框架,以解决相机本地化问题。我们提出了Posegans,这是一种基于条件的生成对抗网络(CGAN),用于实现姿势到图像翻译。 Posegans具有许多创新,包括基于距离的条件歧视器来进行相机本地化和生成相机图像的姿势估计技术,作为提高相机本地化性能的更强限制。与基于学习的回归方法(例如Posenet)相比,Posegans可以使用较小70%的模型尺寸实现更好的性能。此外,Posegans介绍了视图合成技术,以建立2D图像与场景之间的对应关系,\ textit {i.e。},给定姿势,Posegans能够合成其相应的相机图像。此外,我们证明了Posegans在原理上与基于结构的本地化和基于学习的相机本地化回归有所不同,并表明Posegans利用几何结构来完成相机本地化任务,因此比依赖于本地纹理特征的基于学习的回归更稳定,并且更优越。除了相机定位和查看合成外,我们还证明了Posegans可以成功用于其他有趣的应用程序,例如移动对象消除和视频序列中的框架插值。

Camera localization is a fundamental requirement in robotics and computer vision. This paper introduces a pose-to-image translation framework to tackle the camera localization problem. We present PoseGANs, a conditional generative adversarial networks (cGANs) based framework for the implementation of pose-to-image translation. PoseGANs feature a number of innovations including a distance metric based conditional discriminator to conduct camera localization and a pose estimation technique for generated camera images as a stronger constraint to improve camera localization performance. Compared with learning-based regression methods such as PoseNet, PoseGANs can achieve better performance with model sizes that are 70% smaller. In addition, PoseGANs introduce the view synthesis technique to establish the correspondence between the 2D images and the scene, \textit{i.e.}, given a pose, PoseGANs are able to synthesize its corresponding camera images. Furthermore, we demonstrate that PoseGANs differ in principle from structure-based localization and learning-based regressions for camera localization, and show that PoseGANs exploit the geometric structures to accomplish the camera localization task, and is therefore more stable than and superior to learning-based regressions which rely on local texture features instead. In addition to camera localization and view synthesis, we also demonstrate that PoseGANs can be successfully used for other interesting applications such as moving object elimination and frame interpolation in video sequences.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源