学习查询互联网文本以告知强化学习代理商

论文标题

学习查询互联网文本以告知强化学习代理商

Learning to Query Internet Text for Informing Reinforcement Learning Agents

论文作者

Nottingham, Kolby, Pyla, Alekhya, Singh, Sameer, Fox, Roy

论文摘要

在加强学习中概括分配任务是一个具有挑战性的问题。一种成功的方法通过对任务或环境描述的条件进行调节政策来改善概括，以提供有关当前过渡或奖励功能的信息。以前，这些描述通常以生成或人群来源的文本表示。在这项工作中，我们开始解决从野外发现的自然语言中提取有用信息的问题（例如互联网论坛，文档和Wiki）。与以前的方法相比，这些天然，预先存在的来源特别具有挑战性，嘈杂，大而现在的新颖挑战。我们建议通过培训加强学习者来应对这些挑战，以学习将这些来源作为人类的询问，并尝试如何以及何时查询这些挑战。为了解决\ textIt {}的方式，我们证明了经过预定的QA模型在执行目标域中的零击查询方面表现良好。使用质量检查模型检索的信息，我们训练一个代理学习\ textit {何时}应该执行查询。我们表明，我们的方法正确地学习执行查询以在强化学习环境中最大化奖励。

Generalization to out of distribution tasks in reinforcement learning is a challenging problem. One successful approach improves generalization by conditioning policies on task or environment descriptions that provide information about the current transition or reward functions. Previously, these descriptions were often expressed as generated or crowd sourced text. In this work, we begin to tackle the problem of extracting useful information from natural language found in the wild (e.g. internet forums, documentation, and wikis). These natural, pre-existing sources are especially challenging, noisy, and large and present novel challenges compared to previous approaches. We propose to address these challenges by training reinforcement learning agents to learn to query these sources as a human would, and we experiment with how and when an agent should query. To address the \textit{how}, we demonstrate that pretrained QA models perform well at executing zero-shot queries in our target domain. Using information retrieved by a QA model, we train an agent to learn \textit{when} it should execute queries. We show that our method correctly learns to execute queries to maximize reward in a reinforcement learning setting.

下载PDF全文

下载文献需遵守相关版权规定

论文标题