旨在理解层次学习：神经代表的好处

论文标题

旨在理解层次学习：神经代表的好处

Towards Understanding Hierarchical Learning: Benefits of Neural Representations

论文作者

Chen, Minshuo, Bai, Yu, Lee, Jason D., Zhao, Tuo, Wang, Huan, Xiong, Caiming, Socher, Richard

论文摘要

深度神经网络可以在经验上执行有效的分层学习，其中各层学习数据的有用表示。但是，他们如何利用中间表示的方式并未由将它们与“浅层学习者”（例如内核）联系起来的最新理论来解释。在这项工作中，我们证明了中间神经表示对神经网络的灵活性更大，并且在原始输入中可能是有利的。我们将固定的，随机初始化的神经网络视为送入另一个可训练网络的表示函数。 When the trainable network is the quadratic Taylor model of a wide two-layer network, we show that neural representation can achieve improved sample complexities compared with the raw input: For learning a low-rank degree-$p$ polynomial ($p \geq 4$) in $d$ dimension, neural representation requires only $\tilde{O}(d^{\lceil p/2 \rceil})$样本，而原始输入的最著名样本复杂性上限为$ \ tilde {o}（d^{p-1}）$。当可训练的网络改为神经切线内核时，我们将结果与下限的结果对比，表明神经表示不会改善原始输入（在无限宽度极限）。我们的结果是何时神经表示有益，并且可能会提供有关为什么深度在深度学习中很重要的新观点。

Deep neural networks can empirically perform efficient hierarchical learning, in which the layers learn useful representations of the data. However, how they make use of the intermediate representations are not explained by recent theories that relate them to "shallow learners" such as kernels. In this work, we demonstrate that intermediate neural representations add more flexibility to neural networks and can be advantageous over raw inputs. We consider a fixed, randomly initialized neural network as a representation function fed into another trainable network. When the trainable network is the quadratic Taylor model of a wide two-layer network, we show that neural representation can achieve improved sample complexities compared with the raw input: For learning a low-rank degree-$p$ polynomial ($p \geq 4$) in $d$ dimension, neural representation requires only $\tilde{O}(d^{\lceil p/2 \rceil})$ samples, while the best-known sample complexity upper bound for the raw input is $\tilde{O}(d^{p-1})$. We contrast our result with a lower bound showing that neural representations do not improve over the raw input (in the infinite width limit), when the trainable network is instead a neural tangent kernel. Our results characterize when neural representations are beneficial, and may provide a new perspective on why depth is important in deep learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题