论文标题

可扩展的Pac-bayesian元学习通过Pac-optimal Hyperposter:从理论到实践

Scalable PAC-Bayesian Meta-Learning via the PAC-Optimal Hyper-Posterior: From Theory to Practice

论文作者

Rothfuss, Jonas, Josifoski, Martin, Fortuin, Vincent, Krause, Andreas

论文摘要

元学习旨在通过从相关学习任务的数据集中获取有用的归纳偏见来加快新任务的学习过程。在实践中,可用的相关任务的数量通常很少,但大多数现有方法都具有大量的任务。使它们不切实际,容易过度拟合。元学习文献中的一个核心问题是如何正规化以确保概括地看不见任务。在这项工作中,我们使用Pac-Bayesian理论提供了理论分析,并提出了用于元学习的概括,该概述是由Rothfuss等人首先得出的。 (2021a)。至关重要的是,该界限使我们能够得出最佳的高足体的封闭形式,称为PACOH,从而带来了最佳的性能保证。我们提供了理论分析和经验案例研究,在哪些条件以及这些用于元学习的保证在何种程度上可以改善Pac-Bayesian每任务学习界限。封闭形式的PACOH激发了一种实用的元学习方法,该方法避免了对双层优化的依赖,从而导致了一个随机优化问题,该问题适合于标准的变分方法良好的标准方法。我们的实验表明,当通过高斯过程和贝叶斯神经网络模型实例化PACOH时,所得的方法更可扩展,并且在预测准确性和不确定性估计的质量方面产生了最先进的性能。

Meta-Learning aims to speed up the learning process on new tasks by acquiring useful inductive biases from datasets of related learning tasks. While, in practice, the number of related tasks available is often small, most of the existing approaches assume an abundance of tasks; making them unrealistic and prone to overfitting. A central question in the meta-learning literature is how to regularize to ensure generalization to unseen tasks. In this work, we provide a theoretical analysis using the PAC-Bayesian theory and present a generalization bound for meta-learning, which was first derived by Rothfuss et al. (2021a). Crucially, the bound allows us to derive the closed form of the optimal hyper-posterior, referred to as PACOH, which leads to the best performance guarantees. We provide a theoretical analysis and empirical case study under which conditions and to what extent these guarantees for meta-learning improve upon PAC-Bayesian per-task learning bounds. The closed-form PACOH inspires a practical meta-learning approach that avoids the reliance on bi-level optimization, giving rise to a stochastic optimization problem that is amenable to standard variational methods that scale well. Our experiments show that, when instantiating the PACOH with Gaussian processes and Bayesian Neural Networks models, the resulting methods are more scalable, and yield state-of-the-art performance, both in terms of predictive accuracy and the quality of uncertainty estimates.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源