论文标题
使用基于时间逻辑的奖励成型的增强学习分布式控制
Distributed Control using Reinforcement Learning with Temporal-Logic-Based Reward Shaping
论文作者
论文摘要
我们提出了一个计算框架,用于在可观察到的环境中为异构机器人组合分布式控制策略的合成。目的是协同满足以截短的线性时间逻辑(TLTL)公式给出的规格。我们的方法将综合问题提出为随机游戏,并采用策略图方法来找到使用每个代理的内存的控制策略。我们在团队过渡系统和有限状态自动机(FSA)之间构建随机游戏,该游戏跟踪TLTL公式的满意度。我们使用TLTL的定量语义作为游戏的奖励,并使用FSA进一步重塑其指导和加速学习过程。仿真结果证明了在苛刻的任务规格下提出的解决方案的功效以及奖励成型在显着加速学习速度方面的有效性。
We present a computational framework for synthesis of distributed control strategies for a heterogeneous team of robots in a partially observable environment. The goal is to cooperatively satisfy specifications given as Truncated Linear Temporal Logic (TLTL) formulas. Our approach formulates the synthesis problem as a stochastic game and employs a policy graph method to find a control strategy with memory for each agent. We construct the stochastic game on the product between the team transition system and a finite state automaton (FSA) that tracks the satisfaction of the TLTL formula. We use the quantitative semantics of TLTL as the reward of the game, and further reshape it using the FSA to guide and accelerate the learning process. Simulation results demonstrate the efficacy of the proposed solution under demanding task specifications and the effectiveness of reward shaping in significantly accelerating the speed of learning.