减少梯度空间维度的学习数量很少

论文标题

减少梯度空间维度的学习数量很少

Few-Shot Learning by Dimensionality Reduction in Gradient Space

论文作者

Gauch, Martin, Beck, Maximilian, Adler, Thomas, Kotsur, Dmytro, Fiel, Stefan, Eghbal-zadeh, Hamid, Brandstetter, Johannes, Kofler, Johannes, Holzleitner, Markus, Zellinger, Werner, Klotz, Daniel, Hochreiter, Sepp, Lehner, Sebastian

论文摘要

我们介绍了SubGD，这是一种新颖的几弹性学习方法，基于最近的发现，即随机梯度下降更新倾向于生活在低维参数子空间中。在实验和理论分析中，我们表明，局限于适当的预定义子空间的模型可以很好地推广，以进行几次学习。合适的子空间符合给定任务的三个标准：IT（a）允许通过梯度流减少训练误差，（b）导致模型良好的模型，并且（c）可以通过随机梯度下降来识别。 SUBGD从不同任务的更新说明的自动相关矩阵的特征组件中标识了这些子空间。明显的是，我们可以识别出低维合适的子空间，用于几乎没有动态系统的学习，而动力学系统的属性具有不同的属性，该属性由分析系统描述的一个或几个参数描述。这种系统在科学和工程中的现实应用程序中无处不在。我们在实验中证实了SubGD在三个不同的动态系统问题设置上的优势，在样本效率和性能方面，均超过了流行的几次学习方法。

We introduce SubGD, a novel few-shot learning method which is based on the recent finding that stochastic gradient descent updates tend to live in a low-dimensional parameter subspace. In experimental and theoretical analyses, we show that models confined to a suitable predefined subspace generalize well for few-shot learning. A suitable subspace fulfills three criteria across the given tasks: it (a) allows to reduce the training error by gradient flow, (b) leads to models that generalize well, and (c) can be identified by stochastic gradient descent. SubGD identifies these subspaces from an eigendecomposition of the auto-correlation matrix of update directions across different tasks. Demonstrably, we can identify low-dimensional suitable subspaces for few-shot learning of dynamical systems, which have varying properties described by one or few parameters of the analytical system description. Such systems are ubiquitous among real-world applications in science and engineering. We experimentally corroborate the advantages of SubGD on three distinct dynamical systems problem settings, significantly outperforming popular few-shot learning methods both in terms of sample efficiency and performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题