论文标题
浅神经网络的非呈现概括范围
Non-Vacuous Generalisation Bounds for Shallow Neural Networks
论文作者
论文摘要
我们专注于具有单个隐藏层的特定类别的浅神经网络,即具有$ L_2 $ normalister的数据的那些以及Sigmoid形状的高斯错误函数(“ ERF”)激活或高斯误差线性单元(GELU)激活。对于这些网络,我们通过Pac-Bayesian理论得出了新的泛化界限。与大多数现有的界限不同,它们适用于具有确定性而不是随机参数的神经网络。当网络接受Mnist和Fashion-Mnist上的Vanilla随机梯度下降训练时,我们的边界在经验上是不利的。
We focus on a specific class of shallow neural networks with a single hidden layer, namely those with $L_2$-normalised data and either a sigmoid-shaped Gaussian error function ("erf") activation or a Gaussian Error Linear Unit (GELU) activation. For these networks, we derive new generalisation bounds through the PAC-Bayesian theory; unlike most existing such bounds they apply to neural networks with deterministic rather than randomised parameters. Our bounds are empirically non-vacuous when the network is trained with vanilla stochastic gradient descent on MNIST and Fashion-MNIST.