论文标题
快速有限宽度神经切线内核
Fast Finite Width Neural Tangent Kernel
论文作者
论文摘要
神经切线内核(NTK),定义为$θ_θ^f(x_1,x_2)= \ left [\ partial f(θ,x_1)\ big/\ big/\ big/\ poartialθ\ partial f(part] \ cdot)\ big/\部分θ\ right] $是神经网络(NN)雅各布,已经成为深度学习研究的中心对象。在无限宽度极限中,有时可以通过分析计算NTK,对于理解NN体系结构的训练和概括很有用。在有限的宽度下,NTK还用于更好地初始化NN,比较跨模型,执行体系结构搜索并进行元学习。不幸的是,众所周知,有限的宽度NTK计算昂贵,这严重限制了其实用性。我们对有限宽度网络中NTK计算的计算和内存要求进行了第一个深入分析。利用神经网络的结构,我们进一步提出了两种新型算法,这些算法改变了有限宽度NTK的计算和内存要求的指数,从而极大地提高了效率。我们的算法可以以黑匣子方式应用于任何可区分功能,包括实现神经网络的功能。我们在https://github.com/google/neural-tangents的神经切线软件包(ARXIV:1912.02803)中开放我们的实现。
The Neural Tangent Kernel (NTK), defined as $Θ_θ^f(x_1, x_2) = \left[\partial f(θ, x_1)\big/\partial θ\right] \left[\partial f(θ, x_2)\big/\partial θ\right]^T$ where $\left[\partial f(θ, \cdot)\big/\partial θ\right]$ is a neural network (NN) Jacobian, has emerged as a central object of study in deep learning. In the infinite width limit, the NTK can sometimes be computed analytically and is useful for understanding training and generalization of NN architectures. At finite widths, the NTK is also used to better initialize NNs, compare the conditioning across models, perform architecture search, and do meta-learning. Unfortunately, the finite width NTK is notoriously expensive to compute, which severely limits its practical utility. We perform the first in-depth analysis of the compute and memory requirements for NTK computation in finite width networks. Leveraging the structure of neural networks, we further propose two novel algorithms that change the exponent of the compute and memory requirements of the finite width NTK, dramatically improving efficiency. Our algorithms can be applied in a black box fashion to any differentiable function, including those implementing neural networks. We open-source our implementations within the Neural Tangents package (arXiv:1912.02803) at https://github.com/google/neural-tangents.