一种学习控制计算机的数据驱动方法

论文标题

一种学习控制计算机的数据驱动方法

A data-driven approach for learning to control computers

论文作者

Humphreys, Peter C, Raposo, David, Pohlen, Toby, Thornton, Gregory, Chhaparia, Rachita, Muldal, Alistair, Abramson, Josh, Georgiev, Petko, Goldin, Alex, Santoro, Adam, Lillicrap, Timothy

论文摘要

对于人类而言，使用计算机对机器来说将很有用，以便它们可以帮助我们完成日常任务。在这种情况下，也有可能利用大规模的专家演示和人类对互动行为的判断，这是两种在AI中取得了很大成功的成分。在这里，我们使用键盘和鼠标研究了计算机控制的设置，并通过自然语言指定了目标。我们专注于开发一种以增强学习为中心的可扩展方法以及由实际人类计算机相互作用所告知的行为先验，而不是专注于手工设计的课程和专业的行动空间。我们在MiniWob ++基准中的所有任务中实现了最先进的和人类水平的平均表现，这是一套具有挑战性的计算机控制问题套件，并找到了有力的交叉任务转移证据。这些结果表明，当训练机器使用计算机时，统一的人类代理接口的有用性。总的来说，我们的结果表明，在MiniWob ++之外获得能力的公式，并像人类那样控制计算机。

It would be useful for machines to use computers as humans do so that they can aid us in everyday tasks. This is a setting in which there is also the potential to leverage large-scale expert demonstrations and human judgements of interactive behaviour, which are two ingredients that have driven much recent success in AI. Here we investigate the setting of computer control using keyboard and mouse, with goals specified via natural language. Instead of focusing on hand-designed curricula and specialized action spaces, we focus on developing a scalable method centered on reinforcement learning combined with behavioural priors informed by actual human-computer interactions. We achieve state-of-the-art and human-level mean performance across all tasks within the MiniWob++ benchmark, a challenging suite of computer control problems, and find strong evidence of cross-task transfer. These results demonstrate the usefulness of a unified human-agent interface when training machines to use computers. Altogether our results suggest a formula for achieving competency beyond MiniWob++ and towards controlling computers, in general, as a human would.

下载PDF全文

下载文献需遵守相关版权规定

论文标题