两层神经网络的鲁棒性定律

论文标题

两层神经网络的鲁棒性定律

A law of robustness for two-layers neural networks

论文作者

Bubeck, Sébastien, Li, Yuanzhi, Nagaraj, Dheeraj

论文摘要

通过其Lipschitz常数衡量，我们启动了对神经网络及其稳健性之间固有的权衡的研究。我们做出了一个确切的猜想，即对于任何Lipschitz激活功能，对于大多数数据集，具有$ k $神经元的任何两层神经网络非常适合数据，这些数据必须使其Lipschitz的Lipschitz常数（最多达到恒定），而不是$ \ sqrt {n/k} $ n $ n $ n $ n $ nas $ n $ is dataPoints。特别是，这种猜想意味着对于鲁棒性是必需的，因为这意味着每个数据点大约需要一个神经元来确保一个$ O（1）$ - lipschitz网络，而仅数据拟合$ d $ d $数数据只需要一个$ d $ d $ datapoints的神经元。当LipsChitz常数被重量矩阵的光谱标准所取代时，我们证明了这种猜想的较弱版本。我们还证明了高维度$ n \ D $中的猜想（我们也称为倒数案例，因为这里只有$ k \ leq d $在这里相关）。最后，当$ n \ of d d^p $时，我们证明了$ p $的多项式激活函数的猜想。我们通过支持猜想的实验证据来补充这些发现。

We initiate the study of the inherent tradeoffs between the size of a neural network and its robustness, as measured by its Lipschitz constant. We make a precise conjecture that, for any Lipschitz activation function and for most datasets, any two-layers neural network with $k$ neurons that perfectly fit the data must have its Lipschitz constant larger (up to a constant) than $\sqrt{n/k}$ where $n$ is the number of datapoints. In particular, this conjecture implies that overparametrization is necessary for robustness, since it means that one needs roughly one neuron per datapoint to ensure a $O(1)$-Lipschitz network, while mere data fitting of $d$-dimensional data requires only one neuron per $d$ datapoints. We prove a weaker version of this conjecture when the Lipschitz constant is replaced by an upper bound on it based on the spectral norm of the weight matrix. We also prove the conjecture in the high-dimensional regime $n \approx d$ (which we also refer to as the undercomplete case, since only $k \leq d$ is relevant here). Finally we prove the conjecture for polynomial activation functions of degree $p$ when $n \approx d^p$. We complement these findings with experimental evidence supporting the conjecture.

下载PDF全文

下载文献需遵守相关版权规定

论文标题