论文标题

部分可观测时空混沌系统的无模型预测

Functional Ensemble Distillation

论文作者

Penso, Coby, Achituve, Idan, Fetaya, Ethan

论文摘要

贝叶斯模型具有许多理想的特性,最值得注意的是它们从有限的数据中概括并正确估计预测的不确定性。但是,这些好处的计算成本陡峭,因为在大多数情况下,贝叶斯推断在计算上是棘手的。缓解此问题的一种流行方法是使用蒙特卡洛估计,并从后部采样了一组模型。但是,这种方法仍然具有巨大的计算成本,因为人们需要在测试时存储和运行多个模型。在这项工作中,我们研究了如何使用有效的模型最好地提炼整体的预测。首先,我们认为,当前的方法简单地返回预测上的分布无法计算重要属性,例如预测之间的协方差,这对于进一步的处理可能很有价值。其次,在许多有限的数据设置中,所有集合成员都实现了几乎零训练的损失,即,它们对训练集产生了几乎相同的预测,从而导致次优蒸馏模型。为了解决这两个问题,我们提出了一种新颖和一般的蒸馏方法,称为功能集成蒸馏(FED),并研究了如何在这种情况下最好地提炼合奏。我们发现,通过简单的增强方案学习蒸馏模型的形式,可以显着提高性能。我们在几个任务上评估了我们的方法,并表明与当前方法相比,它在准确性和不确定性估计方面都取得了卓越的结果。

Bayesian models have many desirable properties, most notable is their ability to generalize from limited data and to properly estimate the uncertainty in their predictions. However, these benefits come at a steep computational cost as Bayesian inference, in most cases, is computationally intractable. One popular approach to alleviate this problem is using a Monte-Carlo estimation with an ensemble of models sampled from the posterior. However, this approach still comes at a significant computational cost, as one needs to store and run multiple models at test time. In this work, we investigate how to best distill an ensemble's predictions using an efficient model. First, we argue that current approaches that simply return distribution over predictions cannot compute important properties, such as the covariance between predictions, which can be valuable for further processing. Second, in many limited data settings, all ensemble members achieve nearly zero training loss, namely, they produce near-identical predictions on the training set which results in sub-optimal distilled models. To address both problems, we propose a novel and general distillation approach, named Functional Ensemble Distillation (FED), and we investigate how to best distill an ensemble in this setting. We find that learning the distilled model via a simple augmentation scheme in the form of mixup augmentation significantly boosts the performance. We evaluated our method on several tasks and showed that it achieves superior results in both accuracy and uncertainty estimation compared to current approaches.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源