论文标题

从演示到任务空间规格:使用因果分析从演示中提取规则参数化

From Demonstrations to Task-Space Specifications: Using Causal Analysis to Extract Rule Parameterization from Demonstrations

论文作者

Angelov, Daniel, Hristov, Yordan, Ramamoorthy, Subramanian

论文摘要

用户行为的学习模型是一个重要的问题,它广泛适用于需要人类机器人互动的许多应用程序域。在这项工作中,我们表明,可以通过从人类演示中提取的不同用户行为类型学习生成模型,并通过在潜在空间内强制群集群集。我们使用这些模型来区分用户类型,并找到具有重叠解决方案的案例。此外,我们可以改变最初猜测的解决方案,以满足构成特定用户类型的偏好,通过通过学习的可区分模型进行反向传播。以这种方式构建生成模型的一个优点是,我们可以在符号之间提取可能构成用户对任务规范的一部分的因果关系,如演示中所示。我们通过约束优化进一步对这些规格进行参数化,以便找到可以执行运动计划的安全信封。我们表明,所提出的方法能够正确区分三种用户类型,这些用户类型的运动程度不同,同时在桌面环境中执行用固定驱动的机器人进行移动对象的任务。我们的方法在指定的时间内成功地识别了正确的类型,其中99%[97.8-99.8]的案例表现优于IRL基线。我们还表明,我们所提出的方法正确地将默认轨迹更改为满足特定用户规范的一个方法,即使使用看不见的对象。结果轨迹被证明可以直接在完成相同任务的PR2类人体机器人上实现。

Learning models of user behaviour is an important problem that is broadly applicable across many application domains requiring human-robot interaction. In this work, we show that it is possible to learn generative models for distinct user behavioural types, extracted from human demonstrations, by enforcing clustering of preferred task solutions within the latent space. We use these models to differentiate between user types and to find cases with overlapping solutions. Moreover, we can alter an initially guessed solution to satisfy the preferences that constitute a particular user type by backpropagating through the learned differentiable models. An advantage of structuring generative models in this way is that we can extract causal relationships between symbols that might form part of the user's specification of the task, as manifested in the demonstrations. We further parameterize these specifications through constraint optimization in order to find a safety envelope under which motion planning can be performed. We show that the proposed method is capable of correctly distinguishing between three user types, who differ in degrees of cautiousness in their motion, while performing the task of moving objects with a kinesthetically driven robot in a tabletop environment. Our method successfully identifies the correct type, within the specified time, in 99% [97.8 - 99.8] of the cases, which outperforms an IRL baseline. We also show that our proposed method correctly changes a default trajectory to one satisfying a particular user specification even with unseen objects. The resulting trajectory is shown to be directly implementable on a PR2 humanoid robot completing the same task.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源