强大的神经辐射场的多尺度表示

论文标题

强大的神经辐射场的多尺度表示

Robustifying the Multi-Scale Representation of Neural Radiance Fields

论文作者

Jain, Nishant, Kumar, Suryansh, Van Gool, Luc

论文摘要

神经辐射场（NERF）最近作为来自多视图（MV）图像的对象表示的新范式出现。但是，它无法处理多尺度（MS）图像和相机姿势估计错误，这通常是从日常商品摄像机捕获的多视图图像的情况。尽管最近提出的MIP-NERF可以处理NERF的多尺度成像问题，但它无法处理相机姿势估计误差。另一方面，新提出的Barf可以解决NERF的相机姿势问题，但如果图像本质上是多尺度的，则会失败。本文提出了一种强大的多尺度神经辐射场表示方法，以同时克服两个现实世界成像问题。我们的方法通过利用现场刚性的基本原理来处理多尺度成像效应和摄像头估计问题。为了减少由于射线空间中的多尺度图像而导致的令人不快的混叠伪像，我们利用MIP-NERF多尺度表示。为了估算可靠的相机姿势，我们提出了基于图形网络的多运动在神经体积渲染框架中平均。我们证明，以示例为例，对于从日常获得的多视图图像开始对象的准确神经表示，至关重要的是具有精确的相机置率估计值。在不考虑摄像头姿势估计中的鲁棒性措施的情况下，通过圆锥形的多尺度混叠伪像的建模可能会适得其反。我们在基准数据集上介绍了广泛的实验，以证明我们的方法比最近的NERF启发的方法提供了更好的结果。

Neural Radiance Fields (NeRF) recently emerged as a new paradigm for object representation from multi-view (MV) images. Yet, it cannot handle multi-scale (MS) images and camera pose estimation errors, which generally is the case with multi-view images captured from a day-to-day commodity camera. Although recently proposed Mip-NeRF could handle multi-scale imaging problems with NeRF, it cannot handle camera pose estimation error. On the other hand, the newly proposed BARF can solve the camera pose problem with NeRF but fails if the images are multi-scale in nature. This paper presents a robust multi-scale neural radiance fields representation approach to simultaneously overcome both real-world imaging issues. Our method handles multi-scale imaging effects and camera-pose estimation problems with NeRF-inspired approaches by leveraging the fundamentals of scene rigidity. To reduce unpleasant aliasing artifacts due to multi-scale images in the ray space, we leverage Mip-NeRF multi-scale representation. For joint estimation of robust camera pose, we propose graph-neural network-based multiple motion averaging in the neural volume rendering framework. We demonstrate, with examples, that for an accurate neural representation of an object from day-to-day acquired multi-view images, it is crucial to have precise camera-pose estimates. Without considering robustness measures in the camera pose estimation, modeling for multi-scale aliasing artifacts via conical frustum can be counterproductive. We present extensive experiments on the benchmark datasets to demonstrate that our approach provides better results than the recent NeRF-inspired approaches for such realistic settings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题