论文标题
理解强化学习和概率推断
Making Sense of Reinforcement Learning and Probabilistic Inference
论文作者
论文摘要
加强学习(RL)将控制问题与统计估计结合在一起:代理商不知道系统动力学,但可以通过经验来学习。最近的一系列研究将“ RL作为推理”进行了,并提出了将RL问题推广为概率推断的特定框架。我们的论文在这种方法中浮出水面的关键缺点,并阐明了RL可以将RL作为推理问题连贯的意义。特别是,RL代理必须考虑其行动对未来奖励和观察的影响:勘探 - 探索折衷方案。除了最简单的设置外,所产生的推论在计算上是棘手的,因此必须求助于实用的RL算法。我们证明,在非常基本的问题中,流行的“ RL作为推理”的近似值可能会效果不佳。但是,我们表明,框架确实产生了可以证明表现良好的算法,并且表明所得算法等同于最近提出的K-学习算法,我们将与Thompson采样进一步连接。
Reinforcement learning (RL) combines a control problem with statistical estimation: The system dynamics are not known to the agent, but can be learned through experience. A recent line of research casts `RL as inference' and suggests a particular framework to generalize the RL problem as probabilistic inference. Our paper surfaces a key shortcoming in that approach, and clarifies the sense in which RL can be coherently cast as an inference problem. In particular, an RL agent must consider the effects of its actions upon future rewards and observations: The exploration-exploitation tradeoff. In all but the most simple settings, the resulting inference is computationally intractable so that practical RL algorithms must resort to approximation. We demonstrate that the popular `RL as inference' approximation can perform poorly in even very basic problems. However, we show that with a small modification the framework does yield algorithms that can provably perform well, and we show that the resulting algorithm is equivalent to the recently proposed K-learning, which we further connect with Thompson sampling.