论文标题
通过神经网络上的梯度体面的梯度,记忆没有参数过度的高斯人
Memorizing Gaussians with no over-parameterizaion via gradient decent on neural networks
论文作者
论文摘要
我们证明,从正交初始化开始,一个具有$ q $隐藏神经元的梯度体面的步骤,可以记住$ω\ left(\ frac {dq} {\ log^4(d)} \ right)$独立于$ \ mathbbbbbbbb {r}^d $。结果对于包括绝对值的大量激活函数有效。
We prove that a single step of gradient decent over depth two network, with $q$ hidden neurons, starting from orthogonal initialization, can memorize $Ω\left(\frac{dq}{\log^4(d)}\right)$ independent and randomly labeled Gaussians in $\mathbb{R}^d$. The result is valid for a large class of activation functions, which includes the absolute value.