在量化的神经网络中搜索低位权重

论文标题

在量化的神经网络中搜索低位权重

Searching for Low-Bit Weights in Quantized Neural Networks

论文作者

Yang, Zhaohui, Wang, Yunhe, Han, Kai, Xu, Chunjing, Xu, Chao, Tao, Dacheng, Xu, Chang

论文摘要

具有低位权重和激活的量化神经网络对于开发AI加速器具有吸引力。但是，大多数常规量化方法中使用的量化函数是不可差异的，这增加了量化网络的优化难度。与完全精确的参数（即32位浮数）相比，从较小的集合中选择低位值。例如，4位空间中只有16个可能性。因此，我们提出将任意量化神经网络中的离散权重视为可搜索的变量，并利用差分方法准确地搜索它们。特别是，每个权重表示为离散值集的概率分布。概率在训练过程中进行了优化，并选择具有最高概率的值以建立所需的量化网络。基准的实验结果表明，该提出的方法能够在图像分类和超分辨率任务上产生量化的神经网络，具有更高的性能。

Quantized neural networks with low-bit weights and activations are attractive for developing AI accelerators. However, the quantization functions used in most conventional quantization methods are non-differentiable, which increases the optimization difficulty of quantized networks. Compared with full-precision parameters (i.e., 32-bit floating numbers), low-bit values are selected from a much smaller set. For example, there are only 16 possibilities in 4-bit space. Thus, we present to regard the discrete weights in an arbitrary quantized neural network as searchable variables, and utilize a differential method to search them accurately. In particular, each weight is represented as a probability distribution over the discrete value set. The probabilities are optimized during training and the values with the highest probability are selected to establish the desired quantized network. Experimental results on benchmarks demonstrate that the proposed method is able to produce quantized neural networks with higher performance over the state-of-the-art methods on both image classification and super-resolution tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题