使用人路径预测网络对机器人指南进行样品培训

论文标题

使用人路径预测网络对机器人指南进行样品培训

Sample-Efficient Training of Robotic Guide Using Human Path Prediction Network

论文作者

Moon, Hee-Seung, Seo, Jiwon

论文摘要

训练与人交往的机器人具有挑战性。直接让人们参与培训过程是昂贵的，这需要大量的数据样本。本文提出了解决此问题的另一种方法。我们提出了一个人类路径预测网络（HPPN），该网络基于连续的神经网络结构来基于连续机器人动作和人类响应生成用户的未来轨迹。随后，仅提出了一种基于进化的机器人训练方法，仅使用使用HPPN产生的虚拟人类运动。证明我们提出的方法允许对视觉受损的人进行机器人指南的样品培训。通过仅收集来自真实用户的1.5 K剧集，我们能够训练HPPN并产生训练机器人所需的100 k个虚拟剧集。训练有素的机器人精确地指导了目标路径的参与者。此外，使用虚拟情节，我们研究了一种新的奖励设计，该设计在机器人的指导过程中优先考虑人类的舒适性，而不会产生额外费用。预计这种样品效率的训练方法将广泛适用于未来与人体互动的机器人。

Training a robot that engages with people is challenging; it is expensive to directly involve people in the training process, which requires numerous data samples. This paper presents an alternative approach for resolving this problem. We propose a human path prediction network (HPPN) that generates a user's future trajectory based on sequential robot actions and human responses using a recurrent-neural-network structure. Subsequently, an evolution-strategy-based robot training method using only the virtual human movements generated using the HPPN is presented. It is demonstrated that our proposed method permits sample-efficient training of a robotic guide for visually impaired people. By collecting only 1.5 K episodes from real users, we were able to train the HPPN and generate more than 100 K virtual episodes required for training the robot. The trained robot precisely guided blindfolded participants along a target path. Furthermore, using virtual episodes, we investigated a new reward design that prioritizes human comfort during the robot's guidance without incurring additional costs. This sample-efficient training method is expected to be widely applicable to future robots that interact physically with humans.

下载PDF全文

下载文献需遵守相关版权规定

论文标题