论文标题
基于LC算法的模型压缩的灵活,可扩展的软件框架
A flexible, extensible software framework for model compression based on the LC algorithm
论文作者
论文摘要
我们根据学习压缩(LC)算法的思想提出了一个软件框架,该算法允许用户使用不同的努力使用不同的压缩方案来压缩神经网络或其他机器学习模型。当前,受支持的压缩包括修剪,量化,低级别方法(包括自动学习层等级)以及这些组合,用户可以为神经网络的不同部分选择不同的压缩类型。 LC算法交替进行两种类型的步骤,直到收敛:学习(L)步骤,该步骤在数据集中训练模型(使用SGD等算法);和压缩步骤(C)步骤,该步骤压缩模型参数(使用诸如低级或量化之类的压缩方案)。 “机器学习”方面与“信号压缩”方面的分离意味着将模型或压缩类型更改为分别在L或C步骤中调用相应的子例程。图书馆通过设计充分支持这一点,这使其灵活且可扩展。这并不能以性能为代价:压缩模型所需的运行时间与首先训练模型相当;并且压缩模型在预测准确性和压缩比与其他算法(通常专门针对特定模型或压缩方案)方面具有竞争力。该图书馆用Python和Pytorch编写,并在Github提供。
We propose a software framework based on the ideas of the Learning-Compression (LC) algorithm, that allows a user to compress a neural network or other machine learning model using different compression schemes with minimal effort. Currently, the supported compressions include pruning, quantization, low-rank methods (including automatically learning the layer ranks), and combinations of those, and the user can choose different compression types for different parts of a neural network. The LC algorithm alternates two types of steps until convergence: a learning (L) step, which trains a model on a dataset (using an algorithm such as SGD); and a compression (C) step, which compresses the model parameters (using a compression scheme such as low-rank or quantization). This decoupling of the "machine learning" aspect from the "signal compression" aspect means that changing the model or the compression type amounts to calling the corresponding subroutine in the L or C step, respectively. The library fully supports this by design, which makes it flexible and extensible. This does not come at the expense of performance: the runtime needed to compress a model is comparable to that of training the model in the first place; and the compressed model is competitive in terms of prediction accuracy and compression ratio with other algorithms (which are often specialized for specific models or compression schemes). The library is written in Python and PyTorch and available in Github.