论文标题
使用加强学习验证假设
Empirically Verifying Hypotheses Using Reinforcement Learning
论文作者
论文摘要
本文将假设验证作为RL问题提出。具体而言,我们旨在建立一个代理,鉴于关于世界动力学的假设,可以采取行动来产生观察,以帮助预测假设是正确还是错误。现有的RL算法也无法解决此任务,即使对于简单的环境也是如此。为了训练代理,我们利用许多假设的基本结构,将其分配为{前条件,动作序列,条件后}三元组。通过利用这种结构,我们表明RL代理能够在任务中取得成功。此外,随后的策略进行微调允许代理可以正确验证不适合上述分解的假设。
This paper formulates hypothesis verification as an RL problem. Specifically, we aim to build an agent that, given a hypothesis about the dynamics of the world, can take actions to generate observations which can help predict whether the hypothesis is true or false. Existing RL algorithms fail to solve this task, even for simple environments. In order to train the agents, we exploit the underlying structure of many hypotheses, factorizing them as {pre-condition, action sequence, post-condition} triplets. By leveraging this structure we show that RL agents are able to succeed at the task. Furthermore, subsequent fine-tuning of the policies allows the agent to correctly verify hypotheses not amenable to the above factorization.