关于加强学习，效果处理者和国家单调

论文标题

关于加强学习，效果处理者和国家单调

On Reinforcement Learning, Effect Handlers, and the State Monad

论文作者

Lago, Ugo Dal, Gavazzo, Francesco, Ghyselen, Alexis

论文摘要

我们研究代数效应和处理程序，是一种支持功能程序中决策抽象的一种方式，而用户可以要求学习算法来解决选择而无需实施基本选择机制，并通过奖励提供反馈。与最近提出的有关该问题的方法不同的方法是不同的，基于选择单元[Abadi and Plotkin，LICS 2021]，我们将潜在的智能表示为实施的强化学习算法，作为这些代数操作的一组处理程序，包括选择和重新选择。我们展示了如何在实践中使用代数操作和处理程序（在编程语言EFF中可用）将学习算法与环境清楚地分开，从而使学习算法具有良好的模块化。然后，我们展示如何将宿主语言视为带有处理程序的lambda-calculus，这样就显示了基本语言特征是什么。我们通过暗示类型和效果系统如何确保安全性能的结论，同时指向某些方向以进行进一步工作。

We study the algebraic effects and handlers as a way to support decision-making abstractions in functional programs, whereas a user can ask a learning algorithm to resolve choices without implementing the underlying selection mechanism, and give a feedback by way of rewards. Differently from some recently proposed approach to the problem based on the selection monad [Abadi and Plotkin, LICS 2021], we express the underlying intelligence as a reinforcement learning algorithm implemented as a set of handlers for some of these algebraic operations, including those for choices and rewards. We show how we can in practice use algebraic operations and handlers -- as available in the programming language EFF -- to clearly separate the learning algorithm from its environment, thus allowing for a good level of modularity. We then show how the host language can be taken as a lambda-calculus with handlers, this way showing what the essential linguistic features are. We conclude by hinting at how type and effect systems could ensure safety properties, at the same time pointing at some directions for further work.

下载PDF全文

下载文献需遵守相关版权规定

论文标题