论文标题
Depthnet Nano:一种高度紧凑的自我归一化神经网络,用于单眼深度估计
DepthNet Nano: A Highly Compact Self-Normalizing Neural Network for Monocular Depth Estimation
论文作者
论文摘要
深度估计是计算机视觉领域的一个积极研究领域,并且由于其对从机器人技术和无人驾驶汽车到自动驾驶汽车的需求不断增长,因此引起了极大的兴趣。在这一领域,一个特别具有挑战性的问题是单眼深度估计,其目标是从单个图像中推断深度。近年来解决此问题的有效策略是使用深卷积神经网络的利用。尽管取得了这些成功,但此类网络的内存和计算要求使嵌入式场景中的广泛部署非常具有挑战性。在这项研究中,我们介绍了Depthnet Nano,这是一种高度紧凑的自我归一化网络,用于使用人类机器协作设计策略设计的单眼深度估计,其中基于Encoder-decoder设计原理的原理网络设计原型设计与机器驱动的设计探索耦合。结果是一个紧凑的深神经网络,具有高度定制的宏观结构和微体系结构设计以及自称特征,这些特征是针对嵌入式深度估计的任务而高度量身定制的。拟议的Depthnet Nano具有高效的网络结构(例如,比Alhashim等人在Kitti上少24倍,MAC操作少24倍,而在Kitti上的操作少了42倍,同时仍然可以在NYU-DEPTH V2和KITTI数据集的最先进网络中实现可比性的性能。此外,关于Jetson Agx Xavier嵌入模块的推理速度和能源效率的实验进一步说明了Depthnet Nano在不同的分辨率和功率预算下的功效(例如,在Kitti上,在30W Power Budgud的384 x 1280时,〜14 fps和> 0.46 fps和> 0.46 images/sec/sec/sec/watt。
Depth estimation is an active area of research in the field of computer vision, and has garnered significant interest due to its rising demand in a large number of applications ranging from robotics and unmanned aerial vehicles to autonomous vehicles. A particularly challenging problem in this area is monocular depth estimation, where the goal is to infer depth from a single image. An effective strategy that has shown considerable promise in recent years for tackling this problem is the utilization of deep convolutional neural networks. Despite these successes, the memory and computational requirements of such networks have made widespread deployment in embedded scenarios very challenging. In this study, we introduce DepthNet Nano, a highly compact self normalizing network for monocular depth estimation designed using a human machine collaborative design strategy, where principled network design prototyping based on encoder-decoder design principles are coupled with machine-driven design exploration. The result is a compact deep neural network with highly customized macroarchitecture and microarchitecture designs, as well as self-normalizing characteristics, that are highly tailored for the task of embedded depth estimation. The proposed DepthNet Nano possesses a highly efficient network architecture (e.g., 24X smaller and 42X fewer MAC operations than Alhashim et al. on KITTI), while still achieving comparable performance with state-of-the-art networks on the NYU-Depth V2 and KITTI datasets. Furthermore, experiments on inference speed and energy efficiency on a Jetson AGX Xavier embedded module further illustrate the efficacy of DepthNet Nano at different resolutions and power budgets (e.g., ~14 FPS and >0.46 images/sec/watt at 384 X 1280 at a 30W power budget on KITTI).