深网和多种流形问题

论文标题

深网和多种流形问题

Deep Networks and the Multiple Manifold Problem

论文作者

Buchanan, Sam, Gilboa, Dar, Wright, John

论文摘要

我们研究了多种歧管问题，这是一项二进制分类任务，该任务是建立在机器视觉中的应用，其中训练了一个深层完全连接的神经网络，以分离单元球体的两个低维度亚符号。我们提供了一维情况的分析，证明了一种简单的多种形态配置，当网络深度$ l $相对于数据的某些几何和统计属性而言很大时，网络宽度$ n $会在$ l $中作为一个足够大的多项式增长，而i.i.d.d的数量。来自歧管的样本在$ L $中是多项式的，随机定位的梯度下降迅速学会了以高概率完美地对这两个歧管进行分类。我们的分析表明，在实践动机的模型问题的背景下，深度和宽度的具体优势：深度充当拟合资源，更大的深度对应于更平滑的网络，这些网络可以更容易地将类别的流形分开，并且宽度充当统计资源，可以作为一种随机质量的网络和其级别的浓度。该论点围绕神经切线内核及其在过度参数化神经网络的非肌肉分析中的作用； to this literature, we contribute essentially optimal rates of concentration for the neural tangent kernel of deep fully-connected networks, requiring width $n \gtrsim L\,\mathrm{poly}(d_0)$ to achieve uniform concentration of the initial kernel over a $d_0$-dimensional submanifold of the unit sphere $ \ mathbb {s}^{n_0-1} $，以及一个非互闭式框架，用于建立具有结构化数据的NTK制度中训练的网络的概括。该证明使大量使用Martingale浓度来最佳地处理跨初始随机网络层次的统计依赖性。这种方法应用于为其他网络体系结构建立类似的结果。

We study the multiple manifold problem, a binary classification task modeled on applications in machine vision, in which a deep fully-connected neural network is trained to separate two low-dimensional submanifolds of the unit sphere. We provide an analysis of the one-dimensional case, proving for a simple manifold configuration that when the network depth $L$ is large relative to certain geometric and statistical properties of the data, the network width $n$ grows as a sufficiently large polynomial in $L$, and the number of i.i.d. samples from the manifolds is polynomial in $L$, randomly-initialized gradient descent rapidly learns to classify the two manifolds perfectly with high probability. Our analysis demonstrates concrete benefits of depth and width in the context of a practically-motivated model problem: the depth acts as a fitting resource, with larger depths corresponding to smoother networks that can more readily separate the class manifolds, and the width acts as a statistical resource, enabling concentration of the randomly-initialized network and its gradients. The argument centers around the neural tangent kernel and its role in the nonasymptotic analysis of training overparameterized neural networks; to this literature, we contribute essentially optimal rates of concentration for the neural tangent kernel of deep fully-connected networks, requiring width $n \gtrsim L\,\mathrm{poly}(d_0)$ to achieve uniform concentration of the initial kernel over a $d_0$-dimensional submanifold of the unit sphere $\mathbb{S}^{n_0-1}$, and a nonasymptotic framework for establishing generalization of networks trained in the NTK regime with structured data. The proof makes heavy use of martingale concentration to optimally treat statistical dependencies across layers of the initial random network. This approach should be of use in establishing similar results for other network architectures.

下载PDF全文

下载文献需遵守相关版权规定

论文标题