使用概率的时间排名来学习机器人超声扫描的奖励

论文标题

使用概率的时间排名来学习机器人超声扫描的奖励

Learning rewards for robotic ultrasound scanning using probabilistic temporal ranking

论文作者

Burke, Michael, Lu, Katie, Angelov, Daniel, Straižys, Artūras, Innes, Craig, Subr, Kartic, Ramamoorthy, Subramanian

论文摘要

内容丰富的路径规划是一种在机器人技术中选择视觉范围和主动视点选择的良好方法，但通常假设已知合适的成本函数或目标状态。这项工作考虑了一个逆问题，即任务的目标未知，并且需要从演示者提供的探索性示例演示中推断出奖励功能，以便在下游有益的路径计划策略中使用。不幸的是，由于演示的探索性质，许多现有的奖励推理策略不适合这类问题。在本文中，我们提出了另一种方法来应对这些次优，探索性示威的一系列问题。我们假设，在需要发现的任务中，任何演示的连续状态都逐渐与更高的奖励相关联，并使用此假设来产生基于时间的二进制比较结果，并推断出在概率生成模型下支持这些等级的奖励功能。我们将此\ emph {概率的时间排名}进行形式化，并表明它在现有的方法上改进了对自主超声扫描的奖励推断，这是从医学成像中进行的示范学习的新颖应用，同时也是从示范任务中涉及广泛目标学习的广泛目标学习。 \关键字{visual Servoing \ and Reward推论\和概率的时间排名

Informative path-planning is a well established approach to visual-servoing and active viewpoint selection in robotics, but typically assumes that a suitable cost function or goal state is known. This work considers the inverse problem, where the goal of the task is unknown, and a reward function needs to be inferred from exploratory example demonstrations provided by a demonstrator, for use in a downstream informative path-planning policy. Unfortunately, many existing reward inference strategies are unsuited to this class of problems, due to the exploratory nature of the demonstrations. In this paper, we propose an alternative approach to cope with the class of problems where these sub-optimal, exploratory demonstrations occur. We hypothesise that, in tasks which require discovery, successive states of any demonstration are progressively more likely to be associated with a higher reward, and use this hypothesis to generate time-based binary comparison outcomes and infer reward functions that support these ranks, under a probabilistic generative model. We formalise this \emph{probabilistic temporal ranking} approach and show that it improves upon existing approaches to perform reward inference for autonomous ultrasound scanning, a novel application of learning from demonstration in medical imaging while also being of value across a broad range of goal-oriented learning from demonstration tasks. \keywords{Visual servoing \and reward inference \and probabilistic temporal ranking

下载PDF全文

下载文献需遵守相关版权规定

论文标题