与学习路径排名

论文标题

与学习路径排名

Coarse-to-Fine Q-attention with Learned Path Ranking

论文作者

James, Stephen, Abbeel, Pieter

论文摘要

我们提出了学习的路径排名（LPR），该方法接受最终效应的目标姿势，并学会对从一系列路径生成方法产生的一组目标路径进行排名，包括：路径计划，Bezzier曲线采样和学习的策略。核心思想是，每个路径生成模块都将在不同的任务或任务中的不同阶段有用。当添加LPR作为C2F-ARM的扩展时，我们的新系统C2F-ARM+LPR保留了其前身的样本效率，同时也能够完成更大的任务。特别是，需要从示范和勘探数据中推断出非常具体的动议的任务（例如，开放厕所座椅）。除了通过16个RLBench任务进行基准测试我们的方法之外，我们还在10-15分钟内学习了现实世界中的任务，只有3个演示。

We propose Learned Path Ranking (LPR), a method that accepts an end-effector goal pose, and learns to rank a set of goal-reaching paths generated from an array of path generating methods, including: path planning, Bezier curve sampling, and a learned policy. The core idea being that each of the path generation modules will be useful in different tasks, or at different stages in a task. When LPR is added as an extension to C2F-ARM, our new system, C2F-ARM+LPR, retains the sample efficiency of its predecessor, while also being able to accomplish a larger set of tasks; in particular, tasks that require very specific motions (e.g. opening toilet seat) that need to be inferred from both demonstrations and exploration data. In addition to benchmarking our approach across 16 RLBench tasks, we also learn real-world tasks, tabula rasa, in 10-15 minutes, with only 3 demonstrations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题