柔和的事后经验重播

论文标题

柔和的事后经验重播

Soft Hindsight Experience Replay

论文作者

He, Qiwei, Zhuang, Liansheng, Li, Houqiang

论文摘要

在稀疏奖励的环境中有效学习是深度强化学习（DRL）最重要的挑战之一。在连续的DRL环境（例如机器人武器控制）中，事后观察体验重播（她）已显示出有效的解决方案。但是，由于确定性方法的脆弱性，她及其变体通常会面临稳定性和收敛性的主要挑战，这对最终表现产生了重大影响。这项挑战严重限制了此类方法对复杂的现实世界域的适用性。为了应对这一挑战，在本文中，我们提出了基于她和最大的熵增强学习（MERL）的新方法，结合了失败的经验再利用和最大的熵概率推理模型。我们在开放的AI机器人操纵任务上评估Sher，并以稀疏的奖励评估。实验结果表明，与她及其变体相比，我们提出的Sher实现了最先进的表现，尤其是在困难的手工任务中。此外，我们的SHER方法更稳定，在不同的随机种子上实现了非常相似的性能。

Efficient learning in the environment with sparse rewards is one of the most important challenges in Deep Reinforcement Learning (DRL). In continuous DRL environments such as robotic arms control, Hindsight Experience Replay (HER) has been shown an effective solution. However, due to the brittleness of deterministic methods, HER and its variants typically suffer from a major challenge for stability and convergence, which significantly affects the final performance. This challenge severely limits the applicability of such methods to complex real-world domains. To tackle this challenge, in this paper, we propose Soft Hindsight Experience Replay (SHER), a novel approach based on HER and Maximum Entropy Reinforcement Learning (MERL), combining the failed experiences reuse and maximum entropy probabilistic inference model. We evaluate SHER on Open AI Robotic manipulation tasks with sparse rewards. Experimental results show that, in contrast to HER and its variants, our proposed SHER achieves state-of-the-art performance, especially in the difficult HandManipulation tasks. Furthermore, our SHER method is more stable, achieving very similar performance across different random seeds.

下载PDF全文

下载文献需遵守相关版权规定

论文标题