推理学习的理论框架

论文标题

推理学习的理论框架

A Theoretical Framework for Inference Learning

论文作者

Alonso, Nick, Millidge, Beren, Krichmar, Jeff, Neftci, Emre

论文摘要

反向传播（BP）是深度学习中最成功，最广泛使用的算法。但是，BP所需的计算与已知的神经生物学相协调。这种困难引起了人们对BP的更合理替代方案的兴趣。一种这样的算法是推理学习算法（IL）。 IL与皮质功能的神经生物学模型有着密切的联系，并且在监督学习和自动社会任务方面与BP达到了同等的性能。然而，与BP相反，IL的数学基础并没有得到很好的理解。在这里，我们为IL开发了一个新颖的理论框架。我们的主要结果是IL近似于一种称为隐式随机梯度下降（隐式SGD）的优化方法，该方法与BP实施的显式SGD不同。我们的结果进一步显示了如何更改IL的标准实现以更好地近似隐式SGD。我们的新颖实施大大提高了IL跨学习率的稳定性，这与我们的理论一致，因为隐含SGD的关键特性是其稳定性。我们提供了广泛的仿真结果，以进一步支持我们的理论解释，并证明当经过小型迷你批量训练时，同时匹配大型迷你批量的BP的性能，可以证明IL的收敛性更快。

Backpropagation (BP) is the most successful and widely used algorithm in deep learning. However, the computations required by BP are challenging to reconcile with known neurobiology. This difficulty has stimulated interest in more biologically plausible alternatives to BP. One such algorithm is the inference learning algorithm (IL). IL has close connections to neurobiological models of cortical function and has achieved equal performance to BP on supervised learning and auto-associative tasks. In contrast to BP, however, the mathematical foundations of IL are not well-understood. Here, we develop a novel theoretical framework for IL. Our main result is that IL closely approximates an optimization method known as implicit stochastic gradient descent (implicit SGD), which is distinct from the explicit SGD implemented by BP. Our results further show how the standard implementation of IL can be altered to better approximate implicit SGD. Our novel implementation considerably improves the stability of IL across learning rates, which is consistent with our theory, as a key property of implicit SGD is its stability. We provide extensive simulation results that further support our theoretical interpretations and also demonstrate IL achieves quicker convergence when trained with small mini-batches while matching the performance of BP for large mini-batches.

下载PDF全文

下载文献需遵守相关版权规定

论文标题