使用自我注意力和离散差异的自我监督单眼训练的深度估计

论文标题

使用自我注意力和离散差异的自我监督单眼训练的深度估计

Self-supervised Monocular Trained Depth Estimation using Self-attention and Discrete Disparity Volume

论文作者

Johnston, Adrian, Carneiro, Gustavo

论文摘要

单眼深度估计已成为计算机视觉中研究最多的应用之一，其中最准确的方法基于完全监督的学习模型。但是，获得准确和大的地面真实数据集来对这些完全监督的方法进行建模是对该地区进一步发展的主要挑战。经过单眼视频训练的自我监督方法构成了一种减轻上述挑战的最有希望的方法，这是由于培训数据的广泛可用性。因此，它们已经进行了深入的研究，其中探索的主要思想由不同类型的模型体系结构，损失功能和遮挡面具组成，以解决非刚性运动。在本文中，我们提出了两个新想法，以改善自我监督的单眼训练的深度估计：1）自我注意力，以及2）离散的差异预测。与通常的局部卷积操作相比，自我发作可以探索更一般的上下文信息，该信息允许在图像的非连续区域推断相似的差异值。除了能够估算深度不确定性的估计外，与更常见的连续差异预测相比，已通过完全监督的方法显示了离散的差异预测，以提供比更常见的连续差异预测提供更稳定和更清晰的深度估计。我们表明，使用这两个想法的最先进的自我监督的单眼训练的深度估计器monodepth2扩展，这使我们能够设计一个模型，该模型在Kitti 2015和Make3D中产生最佳成绩，从而缩小了差距，以尊重的是自欺欺人的立体训练和全面监督的方法。

Monocular depth estimation has become one of the most studied applications in computer vision, where the most accurate approaches are based on fully supervised learning models. However, the acquisition of accurate and large ground truth data sets to model these fully supervised methods is a major challenge for the further development of the area. Self-supervised methods trained with monocular videos constitute one the most promising approaches to mitigate the challenge mentioned above due to the wide-spread availability of training data. Consequently, they have been intensively studied, where the main ideas explored consist of different types of model architectures, loss functions, and occlusion masks to address non-rigid motion. In this paper, we propose two new ideas to improve self-supervised monocular trained depth estimation: 1) self-attention, and 2) discrete disparity prediction. Compared with the usual localised convolution operation, self-attention can explore a more general contextual information that allows the inference of similar disparity values at non-contiguous regions of the image. Discrete disparity prediction has been shown by fully supervised methods to provide a more robust and sharper depth estimation than the more common continuous disparity prediction, besides enabling the estimation of depth uncertainty. We show that the extension of the state-of-the-art self-supervised monocular trained depth estimator Monodepth2 with these two ideas allows us to design a model that produces the best results in the field in KITTI 2015 and Make3D, closing the gap with respect self-supervised stereo training and fully supervised approaches.

下载PDF全文

下载文献需遵守相关版权规定

论文标题