我什么时候善解人意？多代理相互作用中移情参数估计的效用

论文标题

我什么时候善解人意？多代理相互作用中移情参数估计的效用

When Shall I Be Empathetic? The Utility of Empathetic Parameter Estimation in Multi-Agent Interactions

论文作者

Chen, Yi, Zhang, Lei, Merry, Tanner, Amatya, Sunny, Zhang, Wenlong, Ren, Yi

论文摘要

人机互动（HRI）可以用不完整的信息建模为动态或差异游戏，每个代理都有私人奖励参数。由于寻找完美的贝叶斯均衡之际，现有研究通常考虑由参数估计和运动计划步骤组成的近似解决方案，以使信念和身体动态分离。在参数估计中，当前方法通常假定机器人的奖励参数是人类知道的。我们认为，通过错误地对此假设进行条件调节，机器人对人类参数进行了非同情估计，即使在最简单的相互作用中也会导致不良值。我们通过研究一个不受控制的交叉案例，以短反应时间研究了这一论点。结果表明，当两种代理在不知不觉中具有侵略性（或非攻击性）时，同理心会导致更有效的参数估计和较高的奖励值，这表明当代理人与他们的共同信念不匹配的真实参数时，需要移情。因此，通过充分承认HRI信息不对称性的性质，提出的估计和计划算法比现有方法更强大。最后，我们引入了价值近似技术，以实时执行提出的算法。

Human-robot interactions (HRI) can be modeled as dynamic or differential games with incomplete information, where each agent holds private reward parameters. Due to the open challenge in finding perfect Bayesian equilibria of such games, existing studies often consider approximated solutions composed of parameter estimation and motion planning steps, in order to decouple the belief and physical dynamics. In parameter estimation, current approaches often assume that the reward parameters of the robot are known by the humans. We argue that by falsely conditioning on this assumption, the robot performs non-empathetic estimation of the humans' parameters, leading to undesirable values even in the simplest interactions. We test this argument by studying a two-vehicle uncontrolled intersection case with short reaction time. Results show that when both agents are unknowingly aggressive (or non-aggressive), empathy leads to more effective parameter estimation and higher reward values, suggesting that empathy is necessary when the true parameters of agents mismatch with their common belief. The proposed estimation and planning algorithms are therefore more robust than the existing approaches, by fully acknowledging the nature of information asymmetry in HRI. Lastly, we introduce value approximation techniques for real-time execution of the proposed algorithms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题