询问您的人：使用人类指示来改善加强学习的概括

论文标题

询问您的人：使用人类指示来改善加强学习的概括

Ask Your Humans: Using Human Instructions to Improve Generalization in Reinforcement Learning

论文作者

Chen, Valerie, Gupta, Abhinav, Marino, Kenneth

论文摘要

事实证明，复杂的多任务问题很难在稀疏的奖励增强学习设置中有效解决。为了提高样本，多任务学习需要重复使用和共享低级政策。为了促进层次任务的自动分解，我们建议以自然语言指示和动作轨迹的形式使用分步人的演示。我们在基于制作的网格世界中介绍了此类演示的数据集。我们的模型由一个以语言为条件的高级语言生成器和低级政策组成。我们发现人类的示威有助于解决最复杂的任务。我们还发现，合并自然语言可以使模型可以在零拍设置中概括到看不见的任务，并从一些演示中快速学习。概括不仅反映在代理人的行动中，而且反映在看不见的任务中生成的自然语言指令中。我们的方法还提供了训练有素的代理人可解释的行为，因为它能够生成一系列对其行为的高级描述。

Complex, multi-task problems have proven to be difficult to solve efficiently in a sparse-reward reinforcement learning setting. In order to be sample efficient, multi-task learning requires reuse and sharing of low-level policies. To facilitate the automatic decomposition of hierarchical tasks, we propose the use of step-by-step human demonstrations in the form of natural language instructions and action trajectories. We introduce a dataset of such demonstrations in a crafting-based grid world. Our model consists of a high-level language generator and low-level policy, conditioned on language. We find that human demonstrations help solve the most complex tasks. We also find that incorporating natural language allows the model to generalize to unseen tasks in a zero-shot setting and to learn quickly from a few demonstrations. Generalization is not only reflected in the actions of the agent, but also in the generated natural language instructions in unseen tasks. Our approach also gives our trained agent interpretable behaviors because it is able to generate a sequence of high-level descriptions of its actions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题