论文标题

关于过度拟合的三层神经切线内核模型的概括能力

On the Generalization Power of the Overfitted Three-Layer Neural Tangent Kernel Model

论文作者

Ju, Peizhong, Lin, Xiaojun, Shroff, Ness B.

论文摘要

在本文中,我们研究了过度参数化3层NTK模型的概括性能。我们表明,对于一组特定的地面真相函数(我们称为“可学习集”),过度拟合的3层NTK的测试误差在上面的表达式上,该表达式随着两个隐藏层的神经元数量减少。与仅存在一个隐藏层的2层NTK不同,三层NTK涉及两个隐藏层之间的相互作用。我们的上限表明,在两个隐藏层之间,测试误差相对于第二个隐藏层中神经元的数量(一个接近输出的一个)比第一个隐藏层(一个靠近输入)更快。我们还表明,没有偏差的3层NTK集合不比具有各种偏见的2层NTK模型小。但是,就实际的概括性能而言,我们的结果表明,与2层NTK相比,3层NTK对偏差的选择敏感得多,尤其是当输入维度较大时。

In this paper, we study the generalization performance of overparameterized 3-layer NTK models. We show that, for a specific set of ground-truth functions (which we refer to as the "learnable set"), the test error of the overfitted 3-layer NTK is upper bounded by an expression that decreases with the number of neurons of the two hidden layers. Different from 2-layer NTK where there exists only one hidden-layer, the 3-layer NTK involves interactions between two hidden-layers. Our upper bound reveals that, between the two hidden-layers, the test error descends faster with respect to the number of neurons in the second hidden-layer (the one closer to the output) than with respect to that in the first hidden-layer (the one closer to the input). We also show that the learnable set of 3-layer NTK without bias is no smaller than that of 2-layer NTK models with various choices of bias in the neurons. However, in terms of the actual generalization performance, our results suggest that 3-layer NTK is much less sensitive to the choices of bias than 2-layer NTK, especially when the input dimension is large.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源