论文标题
神经网络中的频率偏差,用于输入不均匀密度
Frequency Bias in Neural Networks for Input of Non-Uniform Density
论文作者
论文摘要
最近的作品部分归因于过度参数化的神经网络的概括能力与频率偏差 - 在从统一分布中绘制的数据上训练的网络在高频较高频率之前发现低频拟合。由于不是从均匀分布中得出现实的训练集,因此我们在这里使用神经切线内核(NTK)模型来探索可变密度对训练动力学的影响。我们的结果结合了分析和经验观察,表明,当学习频率$κ$的纯谐波功能时,在\ sphere^{d-1} $中的$ \ x \ contergence时,会发生在时间$ o(κ^d/p/p(\ x))中,其中$ p(\ x)$ p(\ x)$表示$ \ \ x $的local pecte。具体而言,对于$ \ sphere^1 $中的数据,我们通过分析得出了与NTK相关的两层网络相关的内核的特征。我们进一步证明了有关NTK的光谱分解的深,完全连接的网络的收敛结果。我们的经验研究强调了该模型中深层和浅网络之间的相似性和差异。
Recent works have partly attributed the generalization ability of over-parameterized neural networks to frequency bias -- networks trained with gradient descent on data drawn from a uniform distribution find a low frequency fit before high frequency ones. As realistic training sets are not drawn from a uniform distribution, we here use the Neural Tangent Kernel (NTK) model to explore the effect of variable density on training dynamics. Our results, which combine analytic and empirical observations, show that when learning a pure harmonic function of frequency $κ$, convergence at a point $\x \in \Sphere^{d-1}$ occurs in time $O(κ^d/p(\x))$ where $p(\x)$ denotes the local density at $\x$. Specifically, for data in $\Sphere^1$ we analytically derive the eigenfunctions of the kernel associated with the NTK for two-layer networks. We further prove convergence results for deep, fully connected networks with respect to the spectral decomposition of the NTK. Our empirical study highlights similarities and differences between deep and shallow networks in this model.