论文标题
跨学徒学习框架:财产和解决方案方法
Cross apprenticeship learning framework: Properties and solution approaches
论文作者
论文摘要
学徒学习是一个框架,在该框架中,代理商使用专家提供的示例轨迹学习了在环境中执行给定任务的策略。在现实世界中,在学习任务相同的情况下,在系统动力学不同的不同环境中,人们可能可以访问专家轨迹。对于这种情况,可以定义两种类型的学习目标。一个在一个特定的环境中,当学习策略在所有环境中都表现良好时,该政策在一个特定的环境中表现良好。为了以原则上的方式平衡这两个目标,我们的工作提出了交叉学徒学习(CAL)框架。这是一个优化问题,在该问题中寻求每个环境的最佳政策,同时确保所有政策保持彼此之间。优化问题中的一个调谐参数可以促进此临近。随着调谐参数的变化,我们得出问题的优化者的属性。由于该问题是非convex,因此我们提供凸外近似。最后,我们在大风的环境环境中的导航任务中演示了框架的属性。
Apprenticeship learning is a framework in which an agent learns a policy to perform a given task in an environment using example trajectories provided by an expert. In the real world, one might have access to expert trajectories in different environments where the system dynamics is different while the learning task is the same. For such scenarios, two types of learning objectives can be defined. One where the learned policy performs very well in one specific environment and another when it performs well across all environments. To balance these two objectives in a principled way, our work presents the cross apprenticeship learning (CAL) framework. This consists of an optimization problem where an optimal policy for each environment is sought while ensuring that all policies remain close to each other. This nearness is facilitated by one tuning parameter in the optimization problem. We derive properties of the optimizers of the problem as the tuning parameter varies. Since the problem is nonconvex, we provide a convex outer approximation. Finally, we demonstrate the attributes of our framework in the context of a navigation task in a windy gridworld environment.