有限样本识别具有偏见的较浅神经网络

论文标题

有限样本识别具有偏见的较浅神经网络

Finite Sample Identification of Wide Shallow Neural Networks with Biases

论文作者

Fornasier, Massimo, Klock, Timo, Mondelli, Marco, Rauchensteiner, Michael

论文摘要

人工神经网络的功能取决于通常编码为重量和偏见的有限数量的参数。从输入输出对的有限样本对网络参数的识别通常称为\ emph {教师学生模型}，并且该模型代表了理解训练和概括的流行框架。即使在最坏的情况下，问题是NP的完整，但在添加了合适的分布假设后，迅速增长的文献已经建立了具有许多神经元网络的有限样本鉴定，其中许多神经元$ M = \ Mathcal O（d）$，$ d $是输入维度。对于$ d <m <d^2 $的范围，问题变得更加困难，而对于因偏见而言，网络也很少知道。本文通过提供建设性方法和具有偏见的更广泛浅网络的有限样本识别的理论保证来填补空白。我们的方法基于两步管道：首先，我们通过利用二阶信息来恢复权重方向；接下来，我们通过合适的代数评估来确定符号，并通过梯度下降来恢复经验风险最小化的偏见。数值结果证明了我们方法的有效性。

Artificial neural networks are functions depending on a finite number of parameters typically encoded as weights and biases. The identification of the parameters of the network from finite samples of input-output pairs is often referred to as the \emph{teacher-student model}, and this model has represented a popular framework for understanding training and generalization. Even if the problem is NP-complete in the worst case, a rapidly growing literature -- after adding suitable distributional assumptions -- has established finite sample identification of two-layer networks with a number of neurons $m=\mathcal O(D)$, $D$ being the input dimension. For the range $D<m<D^2$ the problem becomes harder, and truly little is known for networks parametrized by biases as well. This paper fills the gap by providing constructive methods and theoretical guarantees of finite sample identification for such wider shallow networks with biases. Our approach is based on a two-step pipeline: first, we recover the direction of the weights, by exploiting second order information; next, we identify the signs by suitable algebraic evaluations, and we recover the biases by empirical risk minimization via gradient descent. Numerical results demonstrate the effectiveness of our approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题