旋转不变运动的目标条件结合增强学习

论文标题

旋转不变运动的目标条件结合增强学习

Goal-conditioned Batch Reinforcement Learning for Rotation Invariant Locomotion

论文作者

Mavalankar, Aditi

论文摘要

我们提出了一种新的方法，以学习批处理RL设置中的运动策略。批处理数据是由未经目标条件的策略收集的。对于运动任务，这将使用代理商在一个方向上直接行走的策略转化为数据收集，并使用该数据学习目标条件的策略，使代理商能够朝任何方向行走。所使用的数据收集策略应不变，即代理人面临的方向，即无论其初始方向如何，代理都应采取相同的措施向前行走。我们利用这一属性利用两个关键思想来学习目标条件的策略：（1）通过在不同方向上生成具有相同动作的轨迹来增强数据，以及（2）学习一个用暹罗框架在这些旋转轨迹之间实现不变性的编码器。我们表明，我们的方法在3-D运动剂（如Ant，Humanole和Minitaur）上的现有RL算法的表现优于现有的RL算法。

We propose a novel approach to learn goal-conditioned policies for locomotion in a batch RL setting. The batch data is collected by a policy that is not goal-conditioned. For the locomotion task, this translates to data collection using a policy learnt by the agent for walking straight in one direction, and using that data to learn a goal-conditioned policy that enables the agent to walk in any direction. The data collection policy used should be invariant to the direction the agent is facing i.e. regardless of its initial orientation, the agent should take the same actions to walk forward. We exploit this property to learn a goal-conditioned policy using two key ideas: (1) augmenting data by generating trajectories with the same actions in different directions, and (2) learning an encoder that enforces invariance between these rotated trajectories with a Siamese framework. We show that our approach outperforms existing RL algorithms on 3-D locomotion agents like Ant, Humanoid and Minitaur.

下载PDF全文

下载文献需遵守相关版权规定

论文标题