向我展示方式：示威的内在动机

论文标题

向我展示方式：示威的内在动机

Show me the Way: Intrinsic Motivation from Demonstrations

论文作者

Hussenot, Léonard, Dadashi, Robert, Geist, Matthieu, Pietquin, Olivier

论文摘要

决策领域探索的研究历史悠久，但仍在积极进行辩论。从数十年来在各种观点（例如发展心理学，实验设计，人工智能）下涉及该主题的庞大文献中，固有的动机实际上可以转移到人造代理商中。尤其是在最近的深入学习学习（RL）的领域中，代理商以探索奖金的形式实施了这样一个概念（主要是使用新颖的论点），并增加了任务奖励，这鼓励了参观整个环境。这种方法得到了RL上的大量理论支持，而RL的收敛性融合则详尽地探索了这种方法。然而，人类和哺乳动物并没有详尽地探索世界，他们的动机不仅基于新颖性，而且基于其他各种因素（例如好奇心，乐趣，风格，风格，愉悦，安全，竞争等）。他们为终身学习和火车优化，以在没有明显目标的情况下学习操场上的可转移技能。他们还使用先天或学识渊博的先验来节省时间并保持安全。由于这些原因，我们建议从示威活动中学习探索奖金，这些演示可以将这些动机转移到人造代理人的基本原理上几乎没有假设。使用逆RL方法，我们表明，RL代理可以学习并有效地使用复杂的探索行为，并有效地使用详尽探索的任务。

The study of exploration in the domain of decision making has a long history but remains actively debated. From the vast literature that addressed this topic for decades under various points of view (e.g., developmental psychology, experimental design, artificial intelligence), intrinsic motivation emerged as a concept that can practically be transferred to artificial agents. Especially, in the recent field of Deep Reinforcement Learning (RL), agents implement such a concept (mainly using a novelty argument) in the shape of an exploration bonus, added to the task reward, that encourages visiting the whole environment. This approach is supported by the large amount of theory on RL for which convergence to optimality assumes exhaustive exploration. Yet, Human Beings and mammals do not exhaustively explore the world and their motivation is not only based on novelty but also on various other factors (e.g., curiosity, fun, style, pleasure, safety, competition, etc.). They optimize for life-long learning and train to learn transferable skills in playgrounds without obvious goals. They also apply innate or learned priors to save time and stay safe. For these reasons, we propose to learn an exploration bonus from demonstrations that could transfer these motivations to an artificial agent with little assumptions about their rationale. Using an inverse RL approach, we show that complex exploration behaviors, reflecting different motivations, can be learnt and efficiently used by RL agents to solve tasks for which exhaustive exploration is prohibitive.

下载PDF全文

下载文献需遵守相关版权规定

论文标题