厄舍：公正的样本，以重播事后经验

论文标题

厄舍：公正的样本，以重播事后经验

USHER: Unbiased Sampling for Hindsight Experience Replay

论文作者

Schramm, Liam, Deng, Yunfu, Granados, Edgar, Boularias, Abdeslam

论文摘要

处理稀疏奖励是加强学习（RL）的长期挑战。 Hindsight Experience重播（她）通过将失败的轨迹作为一个目标作为另一个目标的成功轨迹来解决这个问题。这允许最低奖励密度和跨多个目标的概括。但是，已知这种策略会导致偏见的价值函数，因为更新规则低估了随机环境中不良结果的可能性。我们提出了一种基于渐近的重要性采样算法来解决此问题而不牺牲确定性环境的绩效。我们在一系列机器人系统上展示了其有效性，包括挑战高维随机环境。

Dealing with sparse rewards is a long-standing challenge in reinforcement learning (RL). Hindsight Experience Replay (HER) addresses this problem by reusing failed trajectories for one goal as successful trajectories for another. This allows for both a minimum density of reward and for generalization across multiple goals. However, this strategy is known to result in a biased value function, as the update rule underestimates the likelihood of bad outcomes in a stochastic environment. We propose an asymptotically unbiased importance-sampling-based algorithm to address this problem without sacrificing performance on deterministic environments. We show its effectiveness on a range of robotic systems, including challenging high dimensional stochastic environments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题