论文标题
通过贝叶斯逆增强学习对演示充分性的自主评估
Autonomous Assessment of Demonstration Sufficiency via Bayesian Inverse Reinforcement Learning
论文作者
论文摘要
我们研究了确定演示充分性的问题:机器人自我评估如何获得专家的足够演示以确保所需的绩效水平?为了解决这个问题,我们提出了一种基于贝叶斯逆增强学习和风险价值的新型自我评估方法,从而实现了从示威者(“ LFD”)机器人的学习,以在其性能上计算高信心范围,并使用这些界限来确定何时有足够的演示。我们提出并评估两个充分性定义:(1)归一化期望值差异,该差异衡量了人类未观察到的奖励功能的遗憾,以及(2)比基线政策的改善百分比。我们演示了如何在这两个指标上制定高信心界限。我们评估了离散和连续状态空间域的模拟方法的方法,并说明了开发可以准确评估演示充分性的机器人系统的可行性。我们还表明,机器人可以利用积极的学习来询问特定状态的示范,这导致机器人仍然对其政策保持较高的信心所需的演示。最后,通过用户研究,我们表明我们的方法成功地使机器人能够在用户所需的性能水平上执行,而无需太多或完美的演示。
We examine the problem of determining demonstration sufficiency: how can a robot self-assess whether it has received enough demonstrations from an expert to ensure a desired level of performance? To address this problem, we propose a novel self-assessment approach based on Bayesian inverse reinforcement learning and value-at-risk, enabling learning-from-demonstration ("LfD") robots to compute high-confidence bounds on their performance and use these bounds to determine when they have a sufficient number of demonstrations. We propose and evaluate two definitions of sufficiency: (1) normalized expected value difference, which measures regret with respect to the human's unobserved reward function, and (2) percent improvement over a baseline policy. We demonstrate how to formulate high-confidence bounds on both of these metrics. We evaluate our approach in simulation for both discrete and continuous state-space domains and illustrate the feasibility of developing a robotic system that can accurately evaluate demonstration sufficiency. We also show that the robot can utilize active learning in asking for demonstrations from specific states which results in fewer demos needed for the robot to still maintain high confidence in its policy. Finally, via a user study, we show that our approach successfully enables robots to perform at users' desired performance levels, without needing too many or perfectly optimal demonstrations.