论文标题

$¶$ ilcro:再次使重要的风景平坦

$¶$ILCRO: Making Importance Landscapes Flat Again

论文作者

Moens, Vincent, Yu, Simiao, Salimi-Khorshidi, Gholamreza

论文摘要

卷积神经网络在众多任务中取得了巨大成功,包括图像分类,对象检测,序列建模等等。通常假定这种神经网络是翻译不变的,这意味着它们可以检测到给定特征,而与输入图像中的位置无关。尽管这对于简单的情况是正确的,但是网络由限制数量的层类组成,而图像相当简单,复杂的图像具有通用最先进的网络通常不享受此属性,就像人们可能希望的那样。本文表明,在初始化时,大多数现有的卷积体系结构定义了一种特定特征的重要性景观,该特征局限于其在培训期间甚至在测试时什至什至在培训期间或什至在测试时的不同位置的能力。我们演示了这种现象是如何在特定条件下发生的,以及如何在某些假设下进行调整。我们得出了P-Objective或Pilcro的PILCRO,以使像素的重要性景观曲率正规化目标,这是一种简单的正则化技术,有利于重量配置,从而产生光滑,低外生的重要性景观,这些景观是根据数据进行的,而不是在数据上而不是在所选的建筑上。通过广泛的实验,我们进一步表明,与常见的计算机视觉分类设置相比,在测试时,POUPTOR的计算机视觉网络的PRECOLANT版本具有统一的重要性景观,更快地训练,可以在测试时更加稳健,并且在测试时更强大。

Convolutional neural networks have had a great success in numerous tasks, including image classification, object detection, sequence modelling, and many more. It is generally assumed that such neural networks are translation invariant, meaning that they can detect a given feature independent of its location in the input image. While this is true for simple cases, where networks are composed of a restricted number of layer classes and where images are fairly simple, complex images with common state-of-the-art networks do not usually enjoy this property as one might hope. This paper shows that most of the existing convolutional architectures define, at initialisation, a specific feature importance landscape that conditions their capacity to attend to different locations of the images later during training or even at test time. We demonstrate how this phenomenon occurs under specific conditions and how it can be adjusted under some assumptions. We derive the P-objective, or PILCRO for Pixel-wise Importance Landscape Curvature Regularised Objective, a simple regularisation technique that favours weight configurations that produce smooth, low-curvature importance landscapes that are conditioned on the data and not on the chosen architecture. Through extensive experiments, we further show that P-regularised versions of popular computer vision networks have a flat importance landscape, train faster, result in a better accuracy and are more robust to noise at test time, when compared to their original counterparts in common computer-vision classification settings.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源