过度参数化的Pac-Bayesian学习的优化和概括

论文标题

过度参数化的Pac-Bayesian学习的优化和概括

Demystify Optimization and Generalization of Over-parameterized PAC-Bayesian Learning

论文作者

Huang, Wei, Liu, Chunrui, Chen, Yilan, Liu, Tianyu, Da Xu, Richard Yi

论文摘要

Pac-Bayesian是一个分析框架，可以将训练误差表示为后验分布中假设的加权平均值，同时纳入了先验知识。除了成为纯粹的概括分析工具外，Pac-Bayesian Bound还可以纳入目标功能以训练概率神经网络，使其成为一个强大而相关的框架，可以在数值上为监督学习提供紧密的概括。为简单起见，我们称使用从Pac-Bayesian界限得出的培训目标学习为{\ IT Pac-Bayesian Learning}的概率神经网络。尽管他们的经验成功，但很少探索对神经网络的Pac-Bayesian学习的理论分析。本文提出了一类新的收敛性和泛化分析，用于PAC-Bayes学习，当它用于通过梯度下降方法训练过度参数化的神经网络。对于广泛的概率神经网络，我们表明，当应用Pac-Bayes学习时，当概率神经切线核（PNTK）用作其内核时，收敛结果对应于解决内核脊回归。基于这一发现，我们进一步表征了统一的pac-bayesian概括结合，该结合改善了基于Rademacher复杂性的非稳定神经网络的结合。最后，从我们的理论结果中获取见解，我们提出了一种有效的超参数选择的代理措施，这被证明是节省时间的。

PAC-Bayesian is an analysis framework where the training error can be expressed as the weighted average of the hypotheses in the posterior distribution whilst incorporating the prior knowledge. In addition to being a pure generalization bound analysis tool, PAC-Bayesian bound can also be incorporated into an objective function to train a probabilistic neural network, making them a powerful and relevant framework that can numerically provide a tight generalization bound for supervised learning. For simplicity, we call probabilistic neural network learned using training objectives derived from PAC-Bayesian bounds as {\it PAC-Bayesian learning}. Despite their empirical success, the theoretical analysis of PAC-Bayesian learning for neural networks is rarely explored. This paper proposes a new class of convergence and generalization analysis for PAC-Bayes learning when it is used to train the over-parameterized neural networks by the gradient descent method. For a wide probabilistic neural network, we show that when PAC-Bayes learning is applied, the convergence result corresponds to solving a kernel ridge regression when the probabilistic neural tangent kernel (PNTK) is used as its kernel. Based on this finding, we further characterize the uniform PAC-Bayesian generalization bound which improves over the Rademacher complexity-based bound for non-probabilistic neural network. Finally, drawing the insight from our theoretical results, we propose a proxy measure for efficient hyperparameters selection, which is proven to be time-saving.

下载PDF全文

下载文献需遵守相关版权规定

论文标题