动作和感知作为差异最小化

论文标题

动作和感知作为差异最小化

Action and Perception as Divergence Minimization

论文作者

Hafner, Danijar, Ortega, Pedro A., Ba, Jimmy, Parr, Thomas, Friston, Karl, Heess, Nicolas

论文摘要

要在复杂的环境中学习定向行为，智能代理需要优化目标功能。各种目标以设计人工代理而闻名，包括任务奖励和内在动机。但是，目前尚不清楚已知目标如何相互关系，哪些目标尚未发现，哪些目标更好地描述了人类的行为。我们介绍了动作感知差异（APD），这是一种分类体现剂可能目标功能空间的方法。我们显示的频谱从狭窄到一般目标。虽然狭窄的目标对应于特定于域的奖励，这是加强学习的典型奖励，但一般目标通过输入序列的潜在变量模型最大程度地利用环境来最大程度地提高信息。这些代理人从直觉上使用感知来使自己的信仰与世界保持一致，并采用行动将世界与他们的信仰保持一致。他们推断出对过去投入的信息，探索未来的意见，这些输入内容丰富，并选择最大程度地影响未来投入的行动或技能。这解释了单个原则中广泛的无监督目标，包括表示学习，信息增益，授权和技能发现。我们的发现表明，利用强大的世界模型进行无监督的探索，这是通往高度适应性代理的途径，这些途径在其环境中寻找大型壁ni，从而使任务奖励可选。

To learn directed behaviors in complex environments, intelligent agents need to optimize objective functions. Various objectives are known for designing artificial agents, including task rewards and intrinsic motivation. However, it is unclear how the known objectives relate to each other, which objectives remain yet to be discovered, and which objectives better describe the behavior of humans. We introduce the Action Perception Divergence (APD), an approach for categorizing the space of possible objective functions for embodied agents. We show a spectrum that reaches from narrow to general objectives. While the narrow objectives correspond to domain-specific rewards as typical in reinforcement learning, the general objectives maximize information with the environment through latent variable models of input sequences. Intuitively, these agents use perception to align their beliefs with the world and use actions to align the world with their beliefs. They infer representations that are informative of past inputs, explore future inputs that are informative of their representations, and select actions or skills that maximally influence future inputs. This explains a wide range of unsupervised objectives from a single principle, including representation learning, information gain, empowerment, and skill discovery. Our findings suggest leveraging powerful world models for unsupervised exploration as a path toward highly adaptive agents that seek out large niches in their environments, rendering task rewards optional.

下载PDF全文

下载文献需遵守相关版权规定

论文标题