论文标题
从物流场景中的视觉观察中的机器人顺序挑选任务无监督的奖励构成
Unsupervised Reward Shaping for a Robotic Sequential Picking Task from Visual Observations in a Logistics Scenario
论文作者
论文摘要
我们专注于一个典型的物流部门的卸载问题,该问题以顺序选择任务为模型。在这种类型的任务中,现代的机器学习技术已经显示出比经典系统更好的工作,因为它们更适合随机性,并且能够更好地应对大型不确定性。更具体地说,在这方面,有监督和模仿学习取得了出色的成果,因为需要某种形式的监督,这对于所有设置并不总是可获得的。另一方面,加固学习(RL)需要更温和的监督形式,但由于其效率低下,仍然不切实际。在本文中,我们提出并理论上激励了一种新颖的无监督奖励构成算法的算法,从而放松了代理商所需的监督水平,并致力于改善我们任务中的RL绩效。
We focus on an unloading problem, typical of the logistics sector, modeled as a sequential pick-and-place task. In this type of task, modern machine learning techniques have shown to work better than classic systems since they are more adaptable to stochasticity and better able to cope with large uncertainties. More specifically, supervised and imitation learning have achieved outstanding results in this regard, with the shortcoming of requiring some form of supervision which is not always obtainable for all settings. On the other hand, reinforcement learning (RL) requires much milder form of supervision but still remains impracticable due to its inefficiency. In this paper, we propose and theoretically motivate a novel Unsupervised Reward Shaping algorithm from expert's observations which relaxes the level of supervision required by the agent and works on improving RL performance in our task.