在测量神经网络中的容量过剩时

论文标题

在测量神经网络中的容量过剩时

On Measuring Excess Capacity in Neural Networks

论文作者

Graf, Florian, Zeng, Sebastian, Rieck, Bastian, Niethammer, Marc, Kwitt, Roland

论文摘要

我们在监督分类的背景下研究深网的过剩能力。也就是说，给定对基本假设类别的能力度量 - 在我们的情况下，是经验性的rademacher复杂性 - 在多大程度上，我们（先验）在多大程度上可以限制此类，同时在与无约束的策略相同的情况下保留经验误差？为了评估现代体系结构（例如剩余网络）的过剩能力，我们扩展并统一了先前的Rademacher复杂性界限，以适应功能组成和增加以及卷积的结构。我们边界中的容量驱动项是层的Lipschitz常数，以及（2，1）组的范围距离卷积重量初始化的距离。在不同任务难度的基准数据集上进行的实验表明，（1）每个任务的容量大量超过容量，并且（2）可以将容量保持在任务中令人惊讶的相似水平。总体而言，这表明了重量规范的可压缩性概念，这是通过重量修剪互补的经典压缩。源代码可从https://github.com/rkwitt/excess_capacity获得。

We study the excess capacity of deep networks in the context of supervised classification. That is, given a capacity measure of the underlying hypothesis class - in our case, empirical Rademacher complexity - to what extent can we (a priori) constrain this class while retaining an empirical error on a par with the unconstrained regime? To assess excess capacity in modern architectures (such as residual networks), we extend and unify prior Rademacher complexity bounds to accommodate function composition and addition, as well as the structure of convolutions. The capacity-driving terms in our bounds are the Lipschitz constants of the layers and an (2, 1) group norm distance to the initializations of the convolution weights. Experiments on benchmark datasets of varying task difficulty indicate that (1) there is a substantial amount of excess capacity per task, and (2) capacity can be kept at a surprisingly similar level across tasks. Overall, this suggests a notion of compressibility with respect to weight norms, complementary to classic compression via weight pruning. Source code is available at https://github.com/rkwitt/excess_capacity.

下载PDF全文

下载文献需遵守相关版权规定

论文标题