关于两层网络的样本复杂性：Lipschitz与元素的Lipschitz激活

论文标题

关于两层网络的样本复杂性：Lipschitz与元素的Lipschitz激活

On the Sample Complexity of Two-Layer Networks: Lipschitz vs. Element-Wise Lipschitz Activation

论文作者

Daniely, Amit, Granot, Elad

论文摘要

我们使用不同的激活函数研究了有界两层神经网络的样品复杂性。特别是我们考虑班 $ \ MATHCAL {H} = \ left \ {\ textbf {x} \ mapsto \ langle \ langle \ textbf {v}，σ\ circ w \ circ w \ circ w \ textbf {b} + \ textbf {b} \ Mathbb {r}^{\ Mathcal {t} \ times d}，\ textbf {v} \ in \ mathbb {r}^{\ mathcal {\ mathcal {t}}} \ right \ right \} $$ 如果$ w $和$ \ textbf {v} $的频谱规范由$ o（1）$界定，则$ w $的Frobenius Norm的初始化限制了$ r> 0 $，而$σ$是Lipschitz的激活功能。我们证明，如果$σ$在元素方面，那么$ \ Mathcal {H} $的样本复杂性仅具有宽度的对数依赖性，并且这种复杂性很紧，可以达到对数因素。我们进一步表明，$σ$的元素属性对于宽度绑定的对数依赖性至关重要，从某种意义上说，存在非元素激活函数，其样品复杂性的宽度是线性的，对于可以在输入维度中达到指数的宽度。对于上界，我们将最新方法用于基于标准的界限近似描述长度（ADL）：Arxiv：1910.05697。我们进一步为这种方法开发了新的技术和工具，这些方法将有望激发未来的工作。

We investigate the sample complexity of bounded two-layer neural networks using different activation functions. In particular, we consider the class $$ \mathcal{H} = \left\{\textbf{x}\mapsto \langle \textbf{v}, σ\circ W\textbf{b} + \textbf{b} \rangle : \textbf{b}\in\mathbb{R}^d, W \in \mathbb{R}^{\mathcal{T}\times d}, \textbf{v} \in \mathbb{R}^{\mathcal{T}}\right\} $$ where the spectral norm of $W$ and $\textbf{v}$ is bounded by $O(1)$, the Frobenius norm of $W$ is bounded from its initialization by $R > 0$, and $σ$ is a Lipschitz activation function. We prove that if $σ$ is element-wise, then the sample complexity of $\mathcal{H}$ has only logarithmic dependency in width and that this complexity is tight, up to logarithmic factors. We further show that the element-wise property of $σ$ is essential for a logarithmic dependency bound in width, in the sense that there exist non-element-wise activation functions whose sample complexity is linear in width, for widths that can be up to exponential in the input dimension. For the upper bound, we use the recent approach for norm-based bounds named Approximate Description Length (ADL) by arXiv:1910.05697. We further develop new techniques and tools for this approach that will hopefully inspire future works.

下载PDF全文

下载文献需遵守相关版权规定

论文标题