论文标题
贝叶斯神经网络通过隐式模型和后验预测分布推断
Bayesian Neural Network Inference via Implicit Models and the Posterior Predictive Distribution
论文作者
论文摘要
我们提出了一种新的方法,可以在复杂模型(例如贝叶斯神经网络)中执行近似贝叶斯推断。与马尔可夫链蒙特卡洛相比,该方法对大数据更可扩展,它具有比变分推断更具表现力的模型,并且不依赖于对抗训练(或密度比估计)。我们采用了构建两个模型的最新方法:(1)一个主要模型,负责执行回归或分类; (2)一个辅助表达式(例如隐式)模型,该模型定义了主模型参数上的近似后验分布。但是,我们根据后验预测分布的蒙特卡洛估计值优化后验模型的参数 - 这是我们唯一的近似值(除后模型除外)。只需要指定一个可能性,可以采用各种形式,例如损失功能和合成可能性,从而提供一种无可能的方法的形式。此外,我们制定了该方法,使得后样品可以独立于或有条件地取决于主要模型的输入。后一种方法被证明能够增加主要模型的明显复杂性。我们认为这在诸如替代和基于物理的模型之类的应用中很有用。为了促进贝叶斯范式如何提供不仅仅是不确定性量化的方式,我们证明了:不确定性量化,多模式以及具有最新预测的神经网络体系结构的应用。
We propose a novel approach to perform approximate Bayesian inference in complex models such as Bayesian neural networks. The approach is more scalable to large data than Markov Chain Monte Carlo, it embraces more expressive models than Variational Inference, and it does not rely on adversarial training (or density ratio estimation). We adopt the recent approach of constructing two models: (1) a primary model, tasked with performing regression or classification; and (2) a secondary, expressive (e.g. implicit) model that defines an approximate posterior distribution over the parameters of the primary model. However, we optimise the parameters of the posterior model via gradient descent according to a Monte Carlo estimate of the posterior predictive distribution -- which is our only approximation (other than the posterior model). Only a likelihood needs to be specified, which can take various forms such as loss functions and synthetic likelihoods, thus providing a form of a likelihood-free approach. Furthermore, we formulate the approach such that the posterior samples can either be independent of, or conditionally dependent upon the inputs to the primary model. The latter approach is shown to be capable of increasing the apparent complexity of the primary model. We see this being useful in applications such as surrogate and physics-based models. To promote how the Bayesian paradigm offers more than just uncertainty quantification, we demonstrate: uncertainty quantification, multi-modality, as well as an application with a recent deep forecasting neural network architecture.