在第一人称模拟的3D环境中，用于稀疏奖励对象互动任务的强化学习

论文标题

在第一人称模拟的3D环境中，用于稀疏奖励对象互动任务的强化学习

Reinforcement Learning for Sparse-Reward Object-Interaction Tasks in a First-person Simulated 3D Environment

论文作者

Carvalho, Wilka, Liang, Anthony, Lee, Kimin, Sohn, Sungryull, Lee, Honglak, Lewis, Richard L., Singh, Satinder

论文摘要

高保真，3D，模拟环境（例如AI2的虚拟家庭环境）中的第一人称对象交流任务对增强学习（RL）代理对稀疏任务的学习构成了重大样本效率挑战。为了减轻这些挑战，先前的工作通过奖励形成，基础对象信息和专家示范的结合提供了广泛的监督。在这项工作中，我们表明，可以通过以对象为中心的关系RL代理在任务学习过程中学习细心的对象模型作为辅助任务，从而在不监督的情况下从头开始学习对象相互作用任务。我们的关键见解是，学习将对象注意力集中在前进预测中的对象模型为对象及其关系的无监督表示学习提供了密集的学习信号。反过来，这可以为以对象为中心的关系RL代理提供更快的策略学习。我们通过在AI2THOR环境中引入一系列具有挑战性的对象交互任务来展示我们的代理，在该环境中，我们细心的对象模型学习是强大绩效的关键。具体来说，我们将我们的代理和关系RL代理与替代辅助任务与配备基本真相对象信息的关系RL代理进行比较，并表明，使用我们的对象模型学习最佳的学习速度和最大成功率都可以缩小性能差距。此外，我们发现将对象注意纳入对象模型的正向预测是捕获对象类别和对象状态的学习表示的关键。

First-person object-interaction tasks in high-fidelity, 3D, simulated environments such as the AI2Thor virtual home-environment pose significant sample-efficiency challenges for reinforcement learning (RL) agents learning from sparse task rewards. To alleviate these challenges, prior work has provided extensive supervision via a combination of reward-shaping, ground-truth object-information, and expert demonstrations. In this work, we show that one can learn object-interaction tasks from scratch without supervision by learning an attentive object-model as an auxiliary task during task learning with an object-centric relational RL agent. Our key insight is that learning an object-model that incorporates object-attention into forward prediction provides a dense learning signal for unsupervised representation learning of both objects and their relationships. This, in turn, enables faster policy learning for an object-centric relational RL agent. We demonstrate our agent by introducing a set of challenging object-interaction tasks in the AI2Thor environment where learning with our attentive object-model is key to strong performance. Specifically, we compare our agent and relational RL agents with alternative auxiliary tasks to a relational RL agent equipped with ground-truth object-information, and show that learning with our object-model best closes the performance gap in terms of both learning speed and maximum success rate. Additionally, we find that incorporating object-attention into an object-model's forward predictions is key to learning representations which capture object-category and object-state.

下载PDF全文

下载文献需遵守相关版权规定

论文标题