论文标题

模仿互动智能

Imitating Interactive Intelligence

论文作者

Abramson, Josh, Ahuja, Arun, Barr, Iain, Brussee, Arthur, Carnevale, Federico, Cassin, Mary, Chhaparia, Rachita, Clark, Stephen, Damoc, Bogdan, Dudzik, Andrew, Georgiev, Petko, Guy, Aurelia, Harley, Tim, Hill, Felix, Hung, Alden, Kenton, Zachary, Landon, Jessica, Lillicrap, Timothy, Mathewson, Kory, Mokrá, Soňa, Muldal, Alistair, Santoro, Adam, Savinov, Nikolay, Varma, Vikrant, Wayne, Greg, Williams, Duncan, Wong, Nathaniel, Yan, Chen, Zhu, Rui

论文摘要

科幻小说的一个共同愿景是,机器人有一天会居住在我们的身体空间中,像我们一样感知世界,协助我们的身体劳动,并通过自然语言与我们交流。在这里,我们研究了如何使用虚拟环境的简化来设计可以与人自然互动的人造代理。然而,这种设置整合了人工智能(AI)研究的许多核心挑战:复杂的视觉感知和目标指导的物理控制,扎根的语言理解和生产以及多代理的社交互动。为了建立可以与人类互动的代理商,我们理想地将它们与人类互动时训练。但是,这目前是不切实际的。因此,我们将人类与另一种学识渊博的代理人近似,并利用逆强化学习中的思想来减少人类和代理人的互动行为之间的差异。严格评估我们的代理商提出了一个巨大的挑战,因此我们开发了各种行为测试,包括观看代理视频或直接与他们互动的人类评估。这些评估令人信服地表明,互动培训和辅助损失可以改善代理行为,超越了单独学习动作的监督学习。此外,我们证明了代理能力在数据集中的字面经验之外概括了。最后,我们训练评估模型,其评级与人类判断非常吻合,从而允许对新代理模型进行评估而无需付出额外的努力。综上所述,我们在这个虚拟环境中的结果提供了证据,表明大规模的人类行为模仿是创建聪明,互动代理的有前途的工具,并且可以可靠地评估这种代理的挑战是可以克服的。

A common vision from science fiction is that robots will one day inhabit our physical spaces, sense the world as we do, assist our physical labours, and communicate with us through natural language. Here we study how to design artificial agents that can interact naturally with humans using the simplification of a virtual environment. This setting nevertheless integrates a number of the central challenges of artificial intelligence (AI) research: complex visual perception and goal-directed physical control, grounded language comprehension and production, and multi-agent social interaction. To build agents that can robustly interact with humans, we would ideally train them while they interact with humans. However, this is presently impractical. Therefore, we approximate the role of the human with another learned agent, and use ideas from inverse reinforcement learning to reduce the disparities between human-human and agent-agent interactive behaviour. Rigorously evaluating our agents poses a great challenge, so we develop a variety of behavioural tests, including evaluation by humans who watch videos of agents or interact directly with them. These evaluations convincingly demonstrate that interactive training and auxiliary losses improve agent behaviour beyond what is achieved by supervised learning of actions alone. Further, we demonstrate that agent capabilities generalise beyond literal experiences in the dataset. Finally, we train evaluation models whose ratings of agents agree well with human judgement, thus permitting the evaluation of new agent models without additional effort. Taken together, our results in this virtual environment provide evidence that large-scale human behavioural imitation is a promising tool to create intelligent, interactive agents, and the challenge of reliably evaluating such agents is possible to surmount.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源