通过张量方法了解深度学习的概括

论文标题

通过张量方法了解深度学习的概括

Understanding Generalization in Deep Learning via Tensor Methods

论文作者

Li, Jingling, Sun, Yanchao, Su, Jiahao, Suzuki, Taiji, Huang, Furong

论文摘要

深度神经网络在看不见的数据上良好地概括了，尽管参数的数量通常远远超过了训练示例的数量。最近提出的复杂性度量从Pac-Bayes的角度，稳健性，过度参数化，压缩等来理解神经网络中的普遍性提供了见解。在这项工作中，我们从压缩的角度提高了对网络架构及其普遍性之间关系的理解。使用张量分析，我们提出了一系列直观，数据依赖性且易于测量的特性，这些特性紧密地表征了神经网络的可压缩性和概括性。因此，在实践中，我们的概括限制了以前的压缩限制，尤其是对于使用张量作为重量内核的神经网络（例如CNN）。此外，这些直观的测量结果为设计神经网络体系结构提供了进一步的见解，具有有利于更好/可保证的通用性的属性。我们的实验结果表明，通过提出的可测量属性，我们的概括误差与测试误差的趋势相匹配。我们的理论分析进一步为某些基于张量的压缩方法的经验成功和局限性提供了理由。我们还发现，通过我们提出的层结构结合张量操作时，当前神经网络的可压缩性和鲁棒性的改善。

Deep neural networks generalize well on unseen data though the number of parameters often far exceeds the number of training examples. Recently proposed complexity measures have provided insights to understanding the generalizability in neural networks from perspectives of PAC-Bayes, robustness, overparametrization, compression and so on. In this work, we advance the understanding of the relations between the network's architecture and its generalizability from the compression perspective. Using tensor analysis, we propose a series of intuitive, data-dependent and easily-measurable properties that tightly characterize the compressibility and generalizability of neural networks; thus, in practice, our generalization bound outperforms the previous compression-based ones, especially for neural networks using tensors as their weight kernels (e.g. CNNs). Moreover, these intuitive measurements provide further insights into designing neural network architectures with properties favorable for better/guaranteed generalizability. Our experimental results demonstrate that through the proposed measurable properties, our generalization error bound matches the trend of the test error well. Our theoretical analysis further provides justifications for the empirical success and limitations of some widely-used tensor-based compression approaches. We also discover the improvements to the compressibility and robustness of current neural networks when incorporating tensor operations via our proposed layer-wise structure.

下载PDF全文

下载文献需遵守相关版权规定

论文标题