软阈值重新聚集的可学习稀疏性

论文标题

软阈值重新聚集的可学习稀疏性

Soft Threshold Weight Reparameterization for Learnable Sparsity

论文作者

Kusupati, Aditya, Ramanujan, Vivek, Somani, Raghav, Wortsman, Mitchell, Jain, Prateek, Kakade, Sham, Farhadi, Ali

论文摘要

在总体参数预算的情况下，对深度神经网络（DNN）的稀疏性进行了广泛的研究，其重点是最大化预测准确性。现有方法依赖于统一或启发式的非统一稀疏预算，这些预算具有次优层的参数分配，从而导致a）较低的预测准确性或b）较高的推理成本（FLOPS）。这项工作提出了软阈值重新聚集（STR），这是软阈值操作员在DNN重量上的新颖使用。 STR在学习修剪阈值的同时平稳诱导稀疏性，从而获得不均匀的稀疏预算。我们的方法可实现CNN中非结构化稀疏性的最新精度（Imnet50和Mobilenetv1在Imagenet-1K上），此外，还学习了不均匀的预算，从经验上将Flops降低了50％。值得注意的是，STR在超稀疏（99％）制度中最多可提高现有结果的准确性高达10％，并且还可以用于诱导RNN中的低级别（结构性稀疏性）。简而言之，STR是一种简单的机制，它学习有效的稀疏预算，与流行的启发式方法形成鲜明对比。代码，预验证的模型和稀疏预算在https://github.com/raivnlab/str上。

Sparsity in Deep Neural Networks (DNNs) is studied extensively with the focus of maximizing prediction accuracy given an overall parameter budget. Existing methods rely on uniform or heuristic non-uniform sparsity budgets which have sub-optimal layer-wise parameter allocation resulting in a) lower prediction accuracy or b) higher inference cost (FLOPs). This work proposes Soft Threshold Reparameterization (STR), a novel use of the soft-threshold operator on DNN weights. STR smoothly induces sparsity while learning pruning thresholds thereby obtaining a non-uniform sparsity budget. Our method achieves state-of-the-art accuracy for unstructured sparsity in CNNs (ResNet50 and MobileNetV1 on ImageNet-1K), and, additionally, learns non-uniform budgets that empirically reduce the FLOPs by up to 50%. Notably, STR boosts the accuracy over existing results by up to 10% in the ultra sparse (99%) regime and can also be used to induce low-rank (structured sparsity) in RNNs. In short, STR is a simple mechanism which learns effective sparsity budgets that contrast with popular heuristics. Code, pretrained models and sparsity budgets are at https://github.com/RAIVNLab/STR.

下载PDF全文

下载文献需遵守相关版权规定

论文标题