论文标题
过度参数化的贝叶斯神经网络的变分推断:一项理论和经验研究
Variational Inference of overparameterized Bayesian Neural Networks: a theoretical and empirical study
论文作者
论文摘要
本文研究了用于训练过度参数化制度的贝叶斯神经网络(BNN)的变异推理(VI),即当神经元的数量趋于无穷大时。更具体地说,我们考虑过度参数化的两层BNN,并指出平均VI训练中的关键问题。这个问题来自于证据(ELBO)的下限分解为两个术语:一个与模型的可能性函数相对应,第二个对应于先前分布和变量后验之间的Kullback-Leibler(KL)差异。特别是,我们从理论和经验上都表明,只有当KL与观测值和神经元之间的比率适当地重新缩放时,这两个术语之间的权衡是在过度参数化制度中的权衡。我们还通过数值实验来说明我们的理论结果,这些实验突出了该比率的关键选择。
This paper studies the Variational Inference (VI) used for training Bayesian Neural Networks (BNN) in the overparameterized regime, i.e., when the number of neurons tends to infinity. More specifically, we consider overparameterized two-layer BNN and point out a critical issue in the mean-field VI training. This problem arises from the decomposition of the lower bound on the evidence (ELBO) into two terms: one corresponding to the likelihood function of the model and the second to the Kullback-Leibler (KL) divergence between the prior distribution and the variational posterior. In particular, we show both theoretically and empirically that there is a trade-off between these two terms in the overparameterized regime only when the KL is appropriately re-scaled with respect to the ratio between the the number of observations and neurons. We also illustrate our theoretical results with numerical experiments that highlight the critical choice of this ratio.