论文标题
HTMRL:生物学上合理的增强学习,并使用等级时间记忆
HTMRL: Biologically Plausible Reinforcement Learning with Hierarchical Temporal Memory
论文作者
论文摘要
能够适应不断发展的任务的建筑加强学习(RL)算法是一项开放的研究挑战。一项固有地处理这种非平稳输入模式的技术是分层时间内存(HTM),这是人类新皮层的一般且在生物学上具有合理的计算模型。由于RL范式受到人类学习的启发,因此HTM是支持非平稳环境的RL算法的自然框架。在本文中,我们介绍了HTMRL,这是第一个严格基于HTM的RL算法。我们从经验和统计上表明,HTMRL缩放到许多州和行动,并证明HTM适应不断变化的模式的能力扩展到RL。具体而言,HTMRL在750步后在10臂强盗上表现良好,但只需要三分之一的人就可以适应强盗突然弹开手臂。 HTMRL是一种新型RL方法的第一次迭代,具有扩展到Meta-Rl的能力算法的潜力。
Building Reinforcement Learning (RL) algorithms which are able to adapt to continuously evolving tasks is an open research challenge. One technology that is known to inherently handle such non-stationary input patterns well is Hierarchical Temporal Memory (HTM), a general and biologically plausible computational model for the human neocortex. As the RL paradigm is inspired by human learning, HTM is a natural framework for an RL algorithm supporting non-stationary environments. In this paper, we present HTMRL, the first strictly HTM-based RL algorithm. We empirically and statistically show that HTMRL scales to many states and actions, and demonstrate that HTM's ability for adapting to changing patterns extends to RL. Specifically, HTMRL performs well on a 10-armed bandit after 750 steps, but only needs a third of that to adapt to the bandit suddenly shuffling its arms. HTMRL is the first iteration of a novel RL approach, with the potential of extending to a capable algorithm for Meta-RL.