我们应该从哪里开始？体重初始化对深神经网络量化行为的低级探索

论文标题

我们应该从哪里开始？体重初始化对深神经网络量化行为的低级探索

Where Should We Begin? A Low-Level Exploration of Weight Initialization Impact on Quantized Behaviour of Deep Neural Networks

论文作者

Yun, Stone, Wong, Alexander

论文摘要

随着用于移动处理的深卷卷神经网络（CNN）算法的扩散，有限的精度量化已成为CNN效率的重要工具。因此，各种工作试图设计固定的精确量化算法和以量化为中心的优化技术，以最大程度地减少量化诱导的性能降解。但是，对各种CNN设计决策/最佳实践如何影响量化的推理行为几乎没有具体的理解。权重初始化策略通常与解决诸如消失/爆炸梯度之类的问题有关，但是通常被掩盖的方面是它们对每一层最终训练的分布的影响。我们提出了一项深入，细粒度的消融研究，以研究不同权重初始化对重量最终分布和不同CNN架构激活的影响。细粒度的层分析使我们能够深入了解初始权重分布如何影响最终精度和量化行为。据我们所知，我们是第一个对权重初始化及其对量化行为的影响进行如此低级，深入的定量分析。

With the proliferation of deep convolutional neural network (CNN) algorithms for mobile processing, limited precision quantization has become an essential tool for CNN efficiency. Consequently, various works have sought to design fixed precision quantization algorithms and quantization-focused optimization techniques that minimize quantization induced performance degradation. However, there is little concrete understanding of how various CNN design decisions/best practices affect quantized inference behaviour. Weight initialization strategies are often associated with solving issues such as vanishing/exploding gradients but an often-overlooked aspect is their impact on the final trained distributions of each layer. We present an in-depth, fine-grained ablation study of the effect of different weights initializations on the final distributions of weights and activations of different CNN architectures. The fine-grained, layerwise analysis enables us to gain deep insights on how initial weights distributions will affect final accuracy and quantized behaviour. To our best knowledge, we are the first to perform such a low-level, in-depth quantitative analysis of weights initialization and its effect on quantized behaviour.

下载PDF全文

下载文献需遵守相关版权规定

论文标题