论文标题

哪些学习算法是在文化中学习?线性模型的调查

What learning algorithm is in-context learning? Investigations with linear models

论文作者

Akyürek, Ekin, Schuurmans, Dale, Andreas, Jacob, Ma, Tengyu, Zhou, Denny

论文摘要

神经序列模型,尤其是变形金刚具有出色的文化学习能力。他们可以从输入中显示的$(x,f(x))$的标记示例序列中构建新的预测变量,而无需进一步的参数更新。我们调查了一个假设,即基于变压器的内在学习者通过在激活中编码较小的模型并更新这些隐式模型时,将标准学习算法隐含地实现,因为新示例出现在上下文中。将线性回归作为原型问题,我们为此假设提供了三种证据。首先,我们通过构造证明,变压器可以基于梯度下降和封闭形式的山脊回归来实施线性模型的学习算法。其次,我们表明,受过训练的中文学习者与梯度下降,脊回归和确切的最小二乘回归计算得出的预测因子,将不同的预测变量在变压器深度和数据集噪声变化之间,并收敛到贝叶斯估计器的大宽度和深度。第三,我们提供了初步证据表明,在这些预测指标中,内在学习者共享算法特征:学习者的晚层非线性编码重量向量和力矩矩阵。这些结果表明,从算法术语中可以理解的文化学习是可以理解的,并且(至少在线性案例中)学习者可以重新发现标准估计算法。代码和参考实现在https://github.com/ekinakyurek/google-research/blob/master/incontext上发布。

Neural sequence models, especially transformers, exhibit a remarkable capacity for in-context learning. They can construct new predictors from sequences of labeled examples $(x, f(x))$ presented in the input without further parameter updates. We investigate the hypothesis that transformer-based in-context learners implement standard learning algorithms implicitly, by encoding smaller models in their activations, and updating these implicit models as new examples appear in the context. Using linear regression as a prototypical problem, we offer three sources of evidence for this hypothesis. First, we prove by construction that transformers can implement learning algorithms for linear models based on gradient descent and closed-form ridge regression. Second, we show that trained in-context learners closely match the predictors computed by gradient descent, ridge regression, and exact least-squares regression, transitioning between different predictors as transformer depth and dataset noise vary, and converging to Bayesian estimators for large widths and depths. Third, we present preliminary evidence that in-context learners share algorithmic features with these predictors: learners' late layers non-linearly encode weight vectors and moment matrices. These results suggest that in-context learning is understandable in algorithmic terms, and that (at least in the linear case) learners may rediscover standard estimation algorithms. Code and reference implementations are released at https://github.com/ekinakyurek/google-research/blob/master/incontext.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源