论文标题

VisualHints:一种用于多模式增强学习的视觉语言环境

VisualHints: A Visual-Lingual Environment for Multimodal Reinforcement Learning

论文作者

Carta, Thomas, Chaudhury, Subhajit, Talamadupula, Kartik, Tatsubori, Michiaki

论文摘要

我们提出VisualHints,这是一个新型的多模式增强学习环境(RL),涉及基于文本的相互作用以及视觉提示(从环境中获得)。现实生活中的问题通常要求代理人使用自然语言信息和视觉感知来解决目标。但是,大多数传统的RL环境要么解决基于视觉的任务,例如Atari Games或基于视频的机器人操作。或完全使用自然语言作为一种互动方式,例如基于文本的游戏和对话系统。在这项工作中,我们旨在弥合这一差距,并在单峰RL的单个环境中统一这两种方法。我们介绍了Textworld烹饪环境的扩展,并增加了整个环境中散布的视觉线索。目的是强迫RL代理使用文本和视觉功能来预测自然语言动作命令,以解决烹饪饭菜的最终任务。我们可以在环境中造成变化和困难,以模仿各种互动现实世界的情况。我们提出了一种基线多模式代理,用于使用基于CNN的特征提示从视觉提示和LSTMS提取以进行文本特征提取来解决此类问题。我们认为,我们提出的视觉语言环境将促进RL社区的新型问题设置。

We present VisualHints, a novel environment for multimodal reinforcement learning (RL) involving text-based interactions along with visual hints (obtained from the environment). Real-life problems often demand that agents interact with the environment using both natural language information and visual perception towards solving a goal. However, most traditional RL environments either solve pure vision-based tasks like Atari games or video-based robotic manipulation; or entirely use natural language as a mode of interaction, like Text-based games and dialog systems. In this work, we aim to bridge this gap and unify these two approaches in a single environment for multimodal RL. We introduce an extension of the TextWorld cooking environment with the addition of visual clues interspersed throughout the environment. The goal is to force an RL agent to use both text and visual features to predict natural language action commands for solving the final task of cooking a meal. We enable variations and difficulties in our environment to emulate various interactive real-world scenarios. We present a baseline multimodal agent for solving such problems using CNN-based feature extraction from visual hints and LSTMs for textual feature extraction. We believe that our proposed visual-lingual environment will facilitate novel problem settings for the RL community.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源