通过桥接正交梯度下降和递归最小二乘的一通学习

论文标题

通过桥接正交梯度下降和递归最小二乘的一通学习

One-Pass Learning via Bridging Orthogonal Gradient Descent and Recursive Least-Squares

论文作者

Min, Youngjae, Ahn, Kwangjun, Azizan, Navid

论文摘要

尽管深度神经网络能够在各个领域中实现最先进的性能，但他们的培训通常需要对数据集的许多通行证进行迭代。但是，由于计算和内存约束和潜在的隐私问题，在数据到达流中的许多实际情况下，存储和访问所有数据都是不切实际的。在本文中，我们研究了一个通行学习的问题，其中模型是在未重新验证之前的数据序列到达数据的情况下进行培训的。通过越来越多地参数化模型的使用，我们开发了正交递归拟合（ORFIT），这是一种用于一通学习的算法，旨在完全适合每个新数据点，同时朝着在先前数据点上更改的方向更改参数。通过这样做，我们在自适应过滤和机器学习中桥接了两种看似不同的算法，即递归最小二乘（RLS）算法和正交梯度下降（OGD）。我们的算法通过通过增量主组件分析（IPCA）利用流数据的结构来有效地利用内存。此外，我们表明，对于过度参数的线性模型，我们的算法获得的参数向量是随机梯度下降（SGD）在标准的多通用设置中会收敛到的。最后，我们将结果推广到高度参数化模型的非线性设置，这与深度学习有关。我们的实验显示了与基准相比，提出的方法的有效性。

While deep neural networks are capable of achieving state-of-the-art performance in various domains, their training typically requires iterating for many passes over the dataset. However, due to computational and memory constraints and potential privacy concerns, storing and accessing all the data is impractical in many real-world scenarios where the data arrives in a stream. In this paper, we investigate the problem of one-pass learning, in which a model is trained on sequentially arriving data without retraining on previous datapoints. Motivated by the increasing use of overparameterized models, we develop Orthogonal Recursive Fitting (ORFit), an algorithm for one-pass learning which seeks to perfectly fit every new datapoint while changing the parameters in a direction that causes the least change to the predictions on previous datapoints. By doing so, we bridge two seemingly distinct algorithms in adaptive filtering and machine learning, namely the recursive least-squares (RLS) algorithm and orthogonal gradient descent (OGD). Our algorithm uses the memory efficiently by exploiting the structure of the streaming data via an incremental principal component analysis (IPCA). Further, we show that, for overparameterized linear models, the parameter vector obtained by our algorithm is what stochastic gradient descent (SGD) would converge to in the standard multi-pass setting. Finally, we generalize the results to the nonlinear setting for highly overparameterized models, relevant for deep learning. Our experiments show the effectiveness of the proposed method compared to the baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题