论文标题

透视平面程序从单个图像引起

Perspective Plane Program Induction from a Single Image

论文作者

Li, Yikai, Mao, Jiayuan, Zhang, Xiuming, Freeman, William T., Tenenbaum, Joshua B., Wu, Jiajun

论文摘要

我们研究了自然图像的整体表示的反图形问题。鉴于输入图像,我们的目标是诱导神经符号,类似程序的表示,该表示,共同建模相机姿势,对象位置和全局场景结构。这样的高级整体场景表示形式进一步促进了低级图像操纵任务,例如介入。我们将此问题提出,因为共同找到了最能描述输入图像的相机姿势和场景结构。这种关节推断的好处是两倍:场景规律性是透视校正的新提示,进而,正确的透视校正校正会导致简化的场景结构,类似于正确的形状如何从质地上导致最规则的质地。我们提出的框架,透视平面计划感应(P3I)结合了基于搜索的算法和基于梯度的算法,以有效地解决该问题。 P3I在一组Internet图像集合上的基线超过了一组基线,包括摄像头姿势估计,全局结构推理和下游图像操纵任务。

We study the inverse graphics problem of inferring a holistic representation for natural images. Given an input image, our goal is to induce a neuro-symbolic, program-like representation that jointly models camera poses, object locations, and global scene structures. Such high-level, holistic scene representations further facilitate low-level image manipulation tasks such as inpainting. We formulate this problem as jointly finding the camera pose and scene structure that best describe the input image. The benefits of such joint inference are two-fold: scene regularity serves as a new cue for perspective correction, and in turn, correct perspective correction leads to a simplified scene structure, similar to how the correct shape leads to the most regular texture in shape from texture. Our proposed framework, Perspective Plane Program Induction (P3I), combines search-based and gradient-based algorithms to efficiently solve the problem. P3I outperforms a set of baselines on a collection of Internet images, across tasks including camera pose estimation, global structure inference, and down-stream image manipulation tasks.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源