论文标题

使用自然语言命令重塑机器人轨迹:使用变压器对多模式数据对齐的研究

Reshaping Robot Trajectories Using Natural Language Commands: A Study of Multi-Modal Data Alignment Using Transformers

论文作者

Bucker, Arthur, Figueredo, Luis, Haddadin, Sami, Kapoor, Ashish, Ma, Shuang, Bonatti, Rogerio

论文摘要

自然语言是我们在表达命令和说明时与其他人互动的最直观的媒介。但是,当人类需要对机器人表达意图时,使用语言很少是一件容易的事,因为当前的大多数语言接口都需要具有静态的动作目标和命令集的刚性模板。在这项工作中,我们为人类机器人协作提供了一个灵活的基于语言的界面,该界面允许用户重塑现有的自主代理的现有轨迹。我们利用大型语言模型(BERT和CLIP)领域的最新进步来编码用户命令,然后使用多模式注意变形金刚将这些功能与轨迹信息相结合。我们使用包含语言命令修改的机器人轨迹的数据集训练模型,并将轨迹生成过程视为序列预测问题,类似于语言生成架构的操作方式。我们在多个模拟轨迹方案中评估了系统,并显示了模型在基线方法上的显着性能提高。此外,我们使用机器人臂的实际实验表明,用户非常喜欢我们的自然语言界面,而不是诸如Kinesthetic教学或成本功能编程等传统方法。我们的研究表明,机器人技术领域如何利用大型的预训练的语言模型来在机器人和机器之间创建更直观的接口。项目网页:https://arthurfenderbucker.github.io/nl_traimptory_reshaper/

Natural language is the most intuitive medium for us to interact with other people when expressing commands and instructions. However, using language is seldom an easy task when humans need to express their intent towards robots, since most of the current language interfaces require rigid templates with a static set of action targets and commands. In this work, we provide a flexible language-based interface for human-robot collaboration, which allows a user to reshape existing trajectories for an autonomous agent. We take advantage of recent advancements in the field of large language models (BERT and CLIP) to encode the user command, and then combine these features with trajectory information using multi-modal attention transformers. We train the model using imitation learning over a dataset containing robot trajectories modified by language commands, and treat the trajectory generation process as a sequence prediction problem, analogously to how language generation architectures operate. We evaluate the system in multiple simulated trajectory scenarios, and show a significant performance increase of our model over baseline approaches. In addition, our real-world experiments with a robot arm show that users significantly prefer our natural language interface over traditional methods such as kinesthetic teaching or cost-function programming. Our study shows how the field of robotics can take advantage of large pre-trained language models towards creating more intuitive interfaces between robots and machines. Project webpage: https://arthurfenderbucker.github.io/NL_trajectory_reshaper/

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源