移动操作的空间动作图

论文标题

移动操作的空间动作图

Spatial Action Maps for Mobile Manipulation

论文作者

Wu, Jimmy, Sun, Xingyuan, Zeng, Andy, Song, Shuran, Lee, Johnny, Rusinkiewicz, Szymon, Funkhouser, Thomas

论文摘要

学习机器人导航的典型端到端公式涉及从当前状态的图像（例如，对猛击重建的鸟眼视图）预测一系列的转向命令动作（例如，向前，向左，右转，向右等）。取而代之的是，我们证明，使用与状态相同的域中定义的密集动作表示形式学习可能是有利的。在这项工作中，我们提出“空间动作图”，其中可能的动作集由像素映射表示（与当前状态的输入图像对齐），其中每个像素代表相应场景位置的本地导航端点。使用Convnet从状态图像中推断出空间动作图，因此在场景中的本地视觉特征上进行了动作预测，从而可以通过增强学习来更快地学习用于移动操纵任务的复杂行为。在我们的实验中，我们将机器人授权将物体推到目标位置，并发现使用空间动作地图学到的政策比传统替代方案要取得的性能要好得多。

Typical end-to-end formulations for learning robotic navigation involve predicting a small set of steering command actions (e.g., step forward, turn left, turn right, etc.) from images of the current state (e.g., a bird's-eye view of a SLAM reconstruction). Instead, we show that it can be advantageous to learn with dense action representations defined in the same domain as the state. In this work, we present "spatial action maps," in which the set of possible actions is represented by a pixel map (aligned with the input image of the current state), where each pixel represents a local navigational endpoint at the corresponding scene location. Using ConvNets to infer spatial action maps from state images, action predictions are thereby spatially anchored on local visual features in the scene, enabling significantly faster learning of complex behaviors for mobile manipulation tasks with reinforcement learning. In our experiments, we task a robot with pushing objects to a goal location, and find that policies learned with spatial action maps achieve much better performance than traditional alternatives.

下载PDF全文

下载文献需遵守相关版权规定

论文标题