无数据的CNN压缩

论文标题

无数据的CNN压缩

Layer-Wise Data-Free CNN Compression

论文作者

Horton, Maxwell, Jin, Yanzi, Farhadi, Ali, Rastegari, Mohammad

论文摘要

我们提出了一种计算有效的方法，用于不使用实际数据而无需使用训练的神经网络。我们将无数据压缩的问题分解为独立的层压缩。我们仅使用预验证的网络来展示如何有效地生成图层训练数据。我们使用这些数据在验证的网络上执行独立的层压缩。我们还展示了如何预先解决网络以提高层 - 压缩方法的准确性。我们提出了使用量化和修剪的层压缩的结果。量化时，我们的精度比相关工作更高，而使用数量级的计算较少。在压缩MobileNetV2并在Imagenet上进行评估时，我们的方法在所有位宽度上都优于现有的量化方法，在$ 8 $ -BIT量化的$+0.34 \％$改善中，以$ 8 $ -BIT的量化改进，并且在较低的位宽度方面取得了更大的改进（$+28.50 $+28.50 \％$ 5 $ 5 $ 5 $ 5 $ 5 $ 5 $ 5 $ 5）。修剪时，我们的表现要优于类似的计算信封的基线，以相同的精度达到稀疏率的1.5美元$倍。我们还展示了如何将高效方法与高计算生成方法相结合以改善其结果。

We present a computationally efficient method for compressing a trained neural network without using real data. We break the problem of data-free network compression into independent layer-wise compressions. We show how to efficiently generate layer-wise training data using only a pretrained network. We use this data to perform independent layer-wise compressions on the pretrained network. We also show how to precondition the network to improve the accuracy of our layer-wise compression method. We present results for layer-wise compression using quantization and pruning. When quantizing, we compress with higher accuracy than related works while using orders of magnitude less compute. When compressing MobileNetV2 and evaluating on ImageNet, our method outperforms existing methods for quantization at all bit-widths, achieving a $+0.34\%$ improvement in $8$-bit quantization, and a stronger improvement at lower bit-widths (up to a $+28.50\%$ improvement at $5$ bits). When pruning, we outperform baselines of a similar compute envelope, achieving $1.5$ times the sparsity rate at the same accuracy. We also show how to combine our efficient method with high-compute generative methods to improve upon their results.

下载PDF全文

下载文献需遵守相关版权规定

论文标题