带有边缘网络的轻质单眼估计

论文标题

带有边缘网络的轻质单眼估计

Lightweight Monocular Depth Estimation with an Edge Guided Network

论文作者

Dong, Xingshuai, Garratt, Matthew A., Anavatti, Sreenatha G., Abbass, Hussein A., Dong, Junyu

论文摘要

单眼深度估计是可以应用于许多机器人应用的重要任务。现有方法着重于通过训练越来越深，更广泛的网络来提高深度估计精度，但是这些方法具有较大的计算复杂性。最近的研究发现，边缘信息是卷积神经网络（CNN）估计深度的重要提示。受上述观测的启发，我们在本研究中提出了一个新型的轻质边缘深度估计网络（EGD-NET）。特别是，我们从轻巧的编码器架构结构开始，并嵌入了边缘指导分支，该分支将其作为输入图像梯度和来自骨架的多尺度特征图以学习边缘注意力特征。为了汇总上下文信息和EDECE COATION功能，我们设计了一个基于变压器的功能聚合模块（TRFA）。 TRFA通过交叉注意机制捕获了上下文信息与边缘注意力特征之间的远程依赖性。我们对NYU深度V2数据集进行了广泛的实验。实验结果表明，该提出的方法在NVIDIA GTX 1080 GPU上运行约96 fps，同时以准确性达到最先进的性能。

Monocular depth estimation is an important task that can be applied to many robotic applications. Existing methods focus on improving depth estimation accuracy via training increasingly deeper and wider networks, however these suffer from large computational complexity. Recent studies found that edge information are important cues for convolutional neural networks (CNNs) to estimate depth. Inspired by the above observations, we present a novel lightweight Edge Guided Depth Estimation Network (EGD-Net) in this study. In particular, we start out with a lightweight encoder-decoder architecture and embed an edge guidance branch which takes as input image gradients and multi-scale feature maps from the backbone to learn the edge attention features. In order to aggregate the context information and edge attention features, we design a transformer-based feature aggregation module (TRFA). TRFA captures the long-range dependencies between the context information and edge attention features through cross-attention mechanism. We perform extensive experiments on the NYU depth v2 dataset. Experimental results show that the proposed method runs about 96 fps on a Nvidia GTX 1080 GPU whilst achieving the state-of-the-art performance in terms of accuracy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题