基于逻辑的奖励成型用于多机构增强学习

论文标题

基于逻辑的奖励成型用于多机构增强学习

Logic-based Reward Shaping for Multi-Agent Reinforcement Learning

论文作者

ElSayed-Aly, Ingy, Feng, Lu

论文摘要

强化学习（RL）在很大程度上依赖于探索以从环境中学习并最大化观察到的奖励。因此，必须设计一个奖励功能，以确保从收到的经验中获得最佳学习。以前的工作将自动机和基于逻辑的奖励成型与环境假设结合在一起，以根据任务提供自动机制来合成奖励功能。但是，如何将基于逻辑的奖励塑造扩大到多代理增强学习（MARL）的工作有限。如果任务需要合作，则环境将需要考虑联合状态，以跟踪其他代理，从而遭受对代理数量的维度的诅咒。该项目探讨了如何为MARL的基于逻辑的奖励成型设计用于不同的场景和任务。我们提出了一种针对半偏心逻辑基于逻辑的MARL奖励成型的新方法，该方法在代理的数量中是可扩展的，并在多种情况下对其进行了评估。

Reinforcement learning (RL) relies heavily on exploration to learn from its environment and maximize observed rewards. Therefore, it is essential to design a reward function that guarantees optimal learning from the received experience. Previous work has combined automata and logic based reward shaping with environment assumptions to provide an automatic mechanism to synthesize the reward function based on the task. However, there is limited work on how to expand logic-based reward shaping to Multi-Agent Reinforcement Learning (MARL). The environment will need to consider the joint state in order to keep track of other agents if the task requires cooperation, thus suffering from the curse of dimensionality with respect to the number of agents. This project explores how logic-based reward shaping for MARL can be designed for different scenarios and tasks. We present a novel method for semi-centralized logic-based MARL reward shaping that is scalable in the number of agents and evaluate it in multiple scenarios.

下载PDF全文

下载文献需遵守相关版权规定

论文标题