迈向更好的概括：没有波动台的联合深度学习

论文标题

迈向更好的概括：没有波动台的联合深度学习

Towards Better Generalization: Joint Depth-Pose Learning without PoseNet

论文作者

Zhao, Wang, Liu, Shaohui, Shu, Yezhi, Liu, Yong-Jin

论文摘要

在这项工作中，我们解决了自我监视的关节深度学习的规模不一致问题的基本问题。大多数现有方法都假定可以在所有输入样本中学习一致的深度和姿势，这使学习问题变得更加困难，从而导致性能降解和在室内环境中的概括和有限的概括以及长期序列的视觉探空仪应用。为了解决这个问题，我们提出了一个新型的系统，该系统明确将尺度从网络估计中分离出来。我们的方法不依赖于Posenet体系结构，而是通过直接从密度的光流对应关系求解基本矩阵来恢复相对姿势，并利用两个视图三角剖分模块来恢复最新的3D结构。然后，我们将深度预测的尺度与三角形点云保持一致，并使用转换后的深度图进行深度误差计算和密集的重新注入检查。我们的整个系统可以端到端共同训练。广泛的实验表明，我们的系统不仅在Kitti深度和流量估计上达到最先进的性能，而且还可以显着提高在各种具有挑战性的场景下现有的自我监督深度学习方法的概括能力，并在各种具有挑战性的场景下实现了最先进的成果，从而在基于自我研究的基于基于学习的基于Kitti odemetry和Nyyuv2 datasassetry和nyyuv2 datasasset的方法中。此外，我们提出了一些有趣的发现，这些发现在概括能力方面基于基于Posenet的相对姿势估计方法的局限性。代码可从https://github.com/b1ueber2y/trianflow获得。

In this work, we tackle the essential problem of scale inconsistency for self-supervised joint depth-pose learning. Most existing methods assume that a consistent scale of depth and pose can be learned across all input samples, which makes the learning problem harder, resulting in degraded performance and limited generalization in indoor environments and long-sequence visual odometry application. To address this issue, we propose a novel system that explicitly disentangles scale from the network estimation. Instead of relying on PoseNet architecture, our method recovers relative pose by directly solving fundamental matrix from dense optical flow correspondence and makes use of a two-view triangulation module to recover an up-to-scale 3D structure. Then, we align the scale of the depth prediction with the triangulated point cloud and use the transformed depth map for depth error computation and dense reprojection check. Our whole system can be jointly trained end-to-end. Extensive experiments show that our system not only reaches state-of-the-art performance on KITTI depth and flow estimation, but also significantly improves the generalization ability of existing self-supervised depth-pose learning methods under a variety of challenging scenarios, and achieves state-of-the-art results among self-supervised learning-based methods on KITTI Odometry and NYUv2 dataset. Furthermore, we present some interesting findings on the limitation of PoseNet-based relative pose estimation methods in terms of generalization ability. Code is available at https://github.com/B1ueber2y/TrianFlow.

下载PDF全文

下载文献需遵守相关版权规定

论文标题