增强学习剂的迭代设计的价值功能分解

论文标题

增强学习剂的迭代设计的价值功能分解

Value Function Decomposition for Iterative Design of Reinforcement Learning Agents

论文作者

MacGlashan, James, Archer, Evan, Devlic, Alisa, Seno, Takuma, Sherstan, Craig, Wurman, Peter R., Stone, Peter

论文摘要

设计增强学习（RL）代理通常是一个艰难的过程，需要大量的设计迭代。学习可能会出于多种原因而失败，并且标准RL方法提供的工具太少，无法洞悉确切原因。在本文中，我们展示了如何将价值分解整合到一类广泛的参与者批评算法中，并使用它来协助迭代代理设计过程。值分解将奖励函数分为不同的组成部分，并学习每个组件的价值估计值。这些价值估计提供了对代理商的学习和决策过程的见解，并使新的培训方法减轻了常见问题。作为演示，我们介绍了SAC-D，这是一种适合价值分解的软演员批评（SAC）的变体。 SAC-D在学习更大的价值预测集的同时，保持与SAC相似的性能。我们还介绍了基于分解的工具来利用此信息，包括新的奖励影响度量指标，该指标衡量了每个奖励组件对代理决策的影响。使用这些工具，我们提供了分解在识别和解决环境和代理设计中的问题中使用的几种证明。价值分解非常适用，易于将其纳入现有算法和工作流程中，使其成为RL从业人员的工具箱中的强大工具。

Designing reinforcement learning (RL) agents is typically a difficult process that requires numerous design iterations. Learning can fail for a multitude of reasons, and standard RL methods provide too few tools to provide insight into the exact cause. In this paper, we show how to integrate value decomposition into a broad class of actor-critic algorithms and use it to assist in the iterative agent-design process. Value decomposition separates a reward function into distinct components and learns value estimates for each. These value estimates provide insight into an agent's learning and decision-making process and enable new training methods to mitigate common problems. As a demonstration, we introduce SAC-D, a variant of soft actor-critic (SAC) adapted for value decomposition. SAC-D maintains similar performance to SAC, while learning a larger set of value predictions. We also introduce decomposition-based tools that exploit this information, including a new reward influence metric, which measures each reward component's effect on agent decision-making. Using these tools, we provide several demonstrations of decomposition's use in identifying and addressing problems in the design of both environments and agents. Value decomposition is broadly applicable and easy to incorporate into existing algorithms and workflows, making it a powerful tool in an RL practitioner's toolbox.

下载PDF全文

下载文献需遵守相关版权规定

论文标题