关于过度拟合的三层神经切线内核模型的概括能力

论文标题

关于过度拟合的三层神经切线内核模型的概括能力

On the Generalization Power of the Overfitted Three-Layer Neural Tangent Kernel Model

论文作者

Ju, Peizhong, Lin, Xiaojun, Shroff, Ness B.

论文摘要

在本文中，我们研究了过度参数化3层NTK模型的概括性能。我们表明，对于一组特定的地面真相函数（我们称为“可学习集”），过度拟合的3层NTK的测试误差在上面的表达式上，该表达式随着两个隐藏层的神经元数量减少。与仅存在一个隐藏层的2层NTK不同，三层NTK涉及两个隐藏层之间的相互作用。我们的上限表明，在两个隐藏层之间，测试误差相对于第二个隐藏层中神经元的数量（一个接近输出的一个）比第一个隐藏层（一个靠近输入）更快。我们还表明，没有偏差的3层NTK集合不比具有各种偏见的2层NTK模型小。但是，就实际的概括性能而言，我们的结果表明，与2层NTK相比，3层NTK对偏差的选择敏感得多，尤其是当输入维度较大时。

In this paper, we study the generalization performance of overparameterized 3-layer NTK models. We show that, for a specific set of ground-truth functions (which we refer to as the "learnable set"), the test error of the overfitted 3-layer NTK is upper bounded by an expression that decreases with the number of neurons of the two hidden layers. Different from 2-layer NTK where there exists only one hidden-layer, the 3-layer NTK involves interactions between two hidden-layers. Our upper bound reveals that, between the two hidden-layers, the test error descends faster with respect to the number of neurons in the second hidden-layer (the one closer to the output) than with respect to that in the first hidden-layer (the one closer to the input). We also show that the learnable set of 3-layer NTK without bias is no smaller than that of 2-layer NTK models with various choices of bias in the neurons. However, in terms of the actual generalization performance, our results suggest that 3-layer NTK is much less sensitive to the choices of bias than 2-layer NTK, especially when the input dimension is large.

下载PDF全文

下载文献需遵守相关版权规定

论文标题