深入研究对象检测中的比例差异问题

论文标题

深入研究对象检测中的比例差异问题

Delving into the Scale Variance Problem in Object Detection

论文作者

Chen, Junliang, Zhao, Xiaodong, Shen, Linlin

论文摘要

由于卷积在提取物体的局部上下文中，在过去十年中，对象检测在过去十年中取得了重大进展。但是，对象的尺度是多样的，当前卷积只能处理单尺度输入。因此，传统卷积具有固定接收场在处理这种规模差异问题方面的能力是有限的。事实证明，多尺度特征表示是缓解规模差异问题的有效方法。最近的研究主要与某些尺度或各种量表的总体特征采用部分联系，并专注于整个量表的全球信息。但是，跨空间和深度维度的信息被忽略了。受此启发，我们提出了多尺度卷积（MSCONV）来解决此问题。同时考虑到量表，空间和深度信息，MSCONV能够更全面地处理多尺度输入。 MSCONV是有效的，并且在计算上有效，只有少量的计算成本增加。对于大多数单阶段对象探测器，在检测头中用MSCONV代替传统的卷积可以带来AP的2.5 \％提高（在Coco 2017数据集上），只有3 \％的拖鞋增加了。 MSCONV也适用于两阶段对象探测器。当扩展到主流两个阶段对象检测器时，MSCONV的AP可以提高3.0 \％。我们在单尺度测试下的最佳模型在Coco 2017上实现了48.9 \％AP，\ textit {test-dev}拆分，它超过了许多最新方法。

Object detection has made substantial progress in the last decade, due to the capability of convolution in extracting local context of objects. However, the scales of objects are diverse and current convolution can only process single-scale input. The capability of traditional convolution with a fixed receptive field in dealing with such a scale variance problem, is thus limited. Multi-scale feature representation has been proven to be an effective way to mitigate the scale variance problem. Recent researches mainly adopt partial connection with certain scales, or aggregate features from all scales and focus on the global information across the scales. However, the information across spatial and depth dimensions is ignored. Inspired by this, we propose the multi-scale convolution (MSConv) to handle this problem. Taking into consideration scale, spatial and depth information at the same time, MSConv is able to process multi-scale input more comprehensively. MSConv is effective and computationally efficient, with only a small increase of computational cost. For most of the single-stage object detectors, replacing the traditional convolutions with MSConvs in the detection head can bring more than 2.5\% improvement in AP (on COCO 2017 dataset), with only 3\% increase of FLOPs. MSConv is also flexible and effective for two-stage object detectors. When extended to the mainstream two-stage object detectors, MSConv can bring up to 3.0\% improvement in AP. Our best model under single-scale testing achieves 48.9\% AP on COCO 2017 \textit{test-dev} split, which surpasses many state-of-the-art methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题