具有等级1因子的高效且可扩展的贝叶斯神经网

论文标题

具有等级1因子的高效且可扩展的贝叶斯神经网

Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors

论文作者

Dusenberry, Michael W., Jerfel, Ghassen, Wen, Yeming, Ma, Yi-An, Snoek, Jasper, Heller, Katherine, Lakshminarayanan, Balaji, Tran, Dustin

论文摘要

贝叶斯神经网络（BNNS）在改善现代深度学习的鲁棒性和不确定性量化方面表现出了有希望的成功。但是，他们通常在规模不足和参数效率的情况下挣扎。另一方面，深层合奏已成为不确定性量化的替代方案，尽管在某些问题上表现出色，但也遭受了效率问题的困扰。目前尚不清楚如何结合这两种方法的优势并补充他们的共同问题。为了应对这一挑战，我们提出了BNN的等级1参数化，其中每个权重矩阵仅涉及等级-1子空间上的分布。我们还重新审视了混合物近似后期捕获多种模式的使用，与典型的混合物不同，这种方法承认记忆的增加明显较小（例如，Resnet-50尺寸混合物的大小10的混合物仅增加0.4％）。我们对先验，后部和改善训练的方法的选择进行系统的实证研究。对于ImageNet上的Resnet-50，在CIFAR-10/100上进行宽重28-10，以及在模拟III上的RNN，rank-1 BNNS在测试集和未分布的变体上实现了跨对数类样，准确性和校准的最先进的性能。

Bayesian neural networks (BNNs) demonstrate promising success in improving the robustness and uncertainty quantification of modern deep learning. However, they generally struggle with underfitting at scale and parameter efficiency. On the other hand, deep ensembles have emerged as alternatives for uncertainty quantification that, while outperforming BNNs on certain problems, also suffer from efficiency issues. It remains unclear how to combine the strengths of these two approaches and remediate their common issues. To tackle this challenge, we propose a rank-1 parameterization of BNNs, where each weight matrix involves only a distribution on a rank-1 subspace. We also revisit the use of mixture approximate posteriors to capture multiple modes, where unlike typical mixtures, this approach admits a significantly smaller memory increase (e.g., only a 0.4% increase for a ResNet-50 mixture of size 10). We perform a systematic empirical study on the choices of prior, variational posterior, and methods to improve training. For ResNet-50 on ImageNet, Wide ResNet 28-10 on CIFAR-10/100, and an RNN on MIMIC-III, rank-1 BNNs achieve state-of-the-art performance across log-likelihood, accuracy, and calibration on the test sets and out-of-distribution variants.

下载PDF全文

下载文献需遵守相关版权规定

论文标题