腹侧传播神经网络：通过选择性注意的对象检测

论文标题

腹侧传播神经网络：通过选择性注意的对象检测

Ventral-Dorsal Neural Networks: Object Detection via Selective Attention

论文作者

Ebrahimpour, Mohammad K., Li, Jiayun, Yu, Yen-Yun, Reese, Jackson L., Moghtaderi, Azadeh, Yang, Ming-Hsuan, Noelle, David C.

论文摘要

深度卷积神经网络（CNN）已反复证明在图像分类任务上表现良好。但是，对象检测方法仍然需要显着改进。在本文中，我们提出了一个称为腹侧传播网络（VDNET）的新框架，该框架灵感来自人类视觉系统的结构。粗略地，沿两个独立的神经流分析了视觉输入信号，一个在颞叶中，另一个在顶叶。这些流之间的粗糙功能区别在于对象识别 - 信号的“什么”和提取位置相关信息 - 信号的“位置”。进入颞叶的主要视觉皮层的腹侧路径由“什么”信息主导，而在顶叶的背途径则由“ where”信息主导。受此结构的启发，我们提出了“腹网络”和“背网络”的集成，这些网络是互补的。有关对象身份的信息可以指导本地化，位置信息可以指导注意相关图像区域，从而改善对象识别。这个新的双网络框架使对象检测的焦点升高。我们的实验结果表明，所提出的方法的表现优于Pascal VOC 2007上最先进的对象检测方法，比8％（MAP）和Pascal VOC 2012比3％（地图）。此外，年鉴图像上技术的比较显示了VDNET的实质性定性和定量益处。

Deep Convolutional Neural Networks (CNNs) have been repeatedly proven to perform well on image classification tasks. Object detection methods, however, are still in need of significant improvements. In this paper, we propose a new framework called Ventral-Dorsal Networks (VDNets) which is inspired by the structure of the human visual system. Roughly, the visual input signal is analyzed along two separate neural streams, one in the temporal lobe and the other in the parietal lobe. The coarse functional distinction between these streams is between object recognition -- the "what" of the signal -- and extracting location related information -- the "where" of the signal. The ventral pathway from primary visual cortex, entering the temporal lobe, is dominated by "what" information, while the dorsal pathway, into the parietal lobe, is dominated by "where" information. Inspired by this structure, we propose the integration of a "Ventral Network" and a "Dorsal Network", which are complementary. Information about object identity can guide localization, and location information can guide attention to relevant image regions, improving object recognition. This new dual network framework sharpens the focus of object detection. Our experimental results reveal that the proposed method outperforms state-of-the-art object detection approaches on PASCAL VOC 2007 by 8% (mAP) and PASCAL VOC 2012 by 3% (mAP). Moreover, a comparison of techniques on Yearbook images displays substantial qualitative and quantitative benefits of VDNet.

下载PDF全文

下载文献需遵守相关版权规定

论文标题