光晕：学习缩小的修剪神经网络

论文标题

光晕：学习缩小的修剪神经网络

HALO: Learning to Prune Neural Networks with Shrinkage

论文作者

Seto, Skyler, Wells, Martin T., Zhang, Wenyu

论文摘要

深层神经网络通过从非结构化数据中提取丰富的功能来实现各种任务的最新性能，但是这种性能与模型大小紧密相关。引起稀疏性和减小模型大小的现代技术是（1）网络修剪，（2）造成稀疏性惩罚的训练，以及（3）（3）与网络重量共同训练二进制面具。我们从贝叶斯等级模型的角度研究了不同的稀疏性，并提出了一种新的惩罚，称为层次自适应拉索（Halo），该惩罚通过可训练的参数学会了适应给定网络的权重的稀疏权。当用来训练过度参数化的网络时，我们的罚款会以高准确性而无需微调而产生小的子网。从经验上讲，在图像识别任务上，我们发现Halo能够学习高度稀疏的网络（只有5％的参数），其性能比同一稀疏度的最先进的修剪方法获得了显着的增长。代码可在https://github.com/skyler120/sparsity-halo上找到。

Deep neural networks achieve state-of-the-art performance in a variety of tasks by extracting a rich set of features from unstructured data, however this performance is closely tied to model size. Modern techniques for inducing sparsity and reducing model size are (1) network pruning, (2) training with a sparsity inducing penalty, and (3) training a binary mask jointly with the weights of the network. We study different sparsity inducing penalties from the perspective of Bayesian hierarchical models and present a novel penalty called Hierarchical Adaptive Lasso (HALO) which learns to adaptively sparsify weights of a given network via trainable parameters. When used to train over-parametrized networks, our penalty yields small subnetworks with high accuracy without fine-tuning. Empirically, on image recognition tasks, we find that HALO is able to learn highly sparse network (only 5% of the parameters) with significant gains in performance over state-of-the-art magnitude pruning methods at the same level of sparsity. Code is available at https://github.com/skyler120/sparsity-halo.

下载PDF全文

下载文献需遵守相关版权规定

论文标题