论文标题
在压缩两层神经网络上的尖锐渐近物
Sharp asymptotics on the compression of two-layer neural networks
论文作者
论文摘要
在本文中,我们研究了具有N节点的目标两层神经网络的压缩到具有M <n节点的压缩网络中。更确切地说,我们考虑目标网络权重为I.I.D的设置。在高斯输入的假设下,次高斯,我们最大程度地降低了目标和压缩网络的输出之间的总体L_2损失。通过使用高维概率的工具,我们表明,当目标网络被充分过度参数化时,可以简化此非凸问题,并提供此近似值的错误率,这是输入维度和N的函数。在此均值限制中,简化的目标,简化的目标,以及仅取决于预期网络的最佳量表,而仅取决于目标网络的最佳范围。此外,对于具有Relu激活的网络,我们认为,通过在等应紧密框架(ETF)上取重量来实现简化优化问题的最佳,而权重的缩放和ETF的方向取决于目标网络的参数。提供数值证据以支持此猜想。
In this paper, we study the compression of a target two-layer neural network with N nodes into a compressed network with M<N nodes. More precisely, we consider the setting in which the weights of the target network are i.i.d. sub-Gaussian, and we minimize the population L_2 loss between the outputs of the target and of the compressed network, under the assumption of Gaussian inputs. By using tools from high-dimensional probability, we show that this non-convex problem can be simplified when the target network is sufficiently over-parameterized, and provide the error rate of this approximation as a function of the input dimension and N. In this mean-field limit, the simplified objective, as well as the optimal weights of the compressed network, does not depend on the realization of the target network, but only on expected scaling factors. Furthermore, for networks with ReLU activation, we conjecture that the optimum of the simplified optimization problem is achieved by taking weights on the Equiangular Tight Frame (ETF), while the scaling of the weights and the orientation of the ETF depend on the parameters of the target network. Numerical evidence is provided to support this conjecture.