关于跨尺度修剪的可预测性

论文标题

关于跨尺度修剪的可预测性

On the Predictability of Pruning Across Scales

论文作者

Rosenfeld, Jonathan S., Frankle, Jonathan, Carbin, Michael, Shavit, Nir

论文摘要

我们表明，迭代幅度延伸的网络的误差在经验上遵循缩放定律，其可解释系数取决于架构和任务。我们在功能上近似修剪网络的误差，这表明它可以通过不变的宽度，深度和修剪级别进行预测，从而可以互换固定密度截然不同的网络。我们证明了这种近似值超过深度，宽度，数据集大小和密度的准确性。我们表明，用于大规模数据（例如，成像网）和体系结构（例如Resnets）的功能形式（概括）。随着神经网络变得越来越大，训练更加昂贵，我们的发现提出了一个在概念和分析上推理的框架，涉及一种非结构化修剪的标准方法。

We show that the error of iteratively magnitude-pruned networks empirically follows a scaling law with interpretable coefficients that depend on the architecture and task. We functionally approximate the error of the pruned networks, showing it is predictable in terms of an invariant tying width, depth, and pruning level, such that networks of vastly different pruned densities are interchangeable. We demonstrate the accuracy of this approximation over orders of magnitude in depth, width, dataset size, and density. We show that the functional form holds (generalizes) for large scale data (e.g., ImageNet) and architectures (e.g., ResNets). As neural networks become ever larger and costlier to train, our findings suggest a framework for reasoning conceptually and analytically about a standard method for unstructured pruning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题