新CRF：神经窗口完全连接的CRF，用于单眼深度估计

论文标题

新CRF：神经窗口完全连接的CRF，用于单眼深度估计

NeW CRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation

论文作者

Yuan, Weihao, Gu, Xiaodong, Dai, Zuozhuo, Zhu, Siyu, Tan, Ping

论文摘要

从单个图像中估算准确的深度是具有挑战性的，因为它本质上是模棱两可的和不正确的。尽管最近的作品设计越来越复杂且功能强大的网络直接回归深度图，但我们采用了CRF优化的路径。由于计算昂贵，CRF通常是在社区之间而不是整个图表之间执行的。为了利用完全连接的CRF的潜力，我们将输入分为窗口，并在每个窗口内执行FC-CRFS优化，从而降低了计算复杂性并使FC-CRFS可行。为了更好地捕获图中节点之间的关系，我们利用多头注意机制来计算多头电位函数，该功能被馈送到网络中以输出优化的深度图。然后，我们构建一个自下而上的倒流结构，该神经窗口FC-CRFS模块充当解码器，而视觉变压器则用作编码器。实验表明，与以前的方法相比，我们的方法显着改善了Kitti和NYUV2数据集的所有指标的性能。此外，提出的方法可以直接应用于全景图像，并胜过MatterPort3D数据集上所有以前的全景方法。项目页面：https：//weihaosky.github.io/newcrfs。

Estimating the accurate depth from a single image is challenging since it is inherently ambiguous and ill-posed. While recent works design increasingly complicated and powerful networks to directly regress the depth map, we take the path of CRFs optimization. Due to the expensive computation, CRFs are usually performed between neighborhoods rather than the whole graph. To leverage the potential of fully-connected CRFs, we split the input into windows and perform the FC-CRFs optimization within each window, which reduces the computation complexity and makes FC-CRFs feasible. To better capture the relationships between nodes in the graph, we exploit the multi-head attention mechanism to compute a multi-head potential function, which is fed to the networks to output an optimized depth map. Then we build a bottom-up-top-down structure, where this neural window FC-CRFs module serves as the decoder, and a vision transformer serves as the encoder. The experiments demonstrate that our method significantly improves the performance across all metrics on both the KITTI and NYUv2 datasets, compared to previous methods. Furthermore, the proposed method can be directly applied to panorama images and outperforms all previous panorama methods on the MatterPort3D dataset. Project page: https://weihaosky.github.io/newcrfs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题