论文标题

DARA:在离线增强学习中的动态意识奖励增强

DARA: Dynamics-Aware Reward Augmentation in Offline Reinforcement Learning

论文作者

Liu, Jinxin, Zhang, Hongyin, Wang, Donglin

论文摘要

离线增强学习算法有望适用于可用固定数据集并且无法获得新经验的设置。但是,这种表述不可避免地是悬而未决的,并且在实践中,在一个特定环境中为一个特定任务收集一个大型离线数据集也是昂贵且费力的。在本文中,我们因此1)通过使用(源)离线数据从另一个动力学收集的(源)离线数据来放宽对广泛(目标)离线数据的要求,以表征动态变化问题,以使先前的离线方法无法很好地扩展,而3)从同时使用模型的模型和模型基于模型的模型,以使先前的离线方法不太扩展,而该动态转移问题是不正确的。具体而言,Dara强调从那些对目标环境适应性的源过渡对学习,并通过表征状态行动 - 隔壁状态对而不是先前的离线RL方法勾勒出的典型状态行动分布来减轻离线动力学的变化。实验评估表明,通过增加源离线数据集中的奖励,DARA可以为目标环境获得自适应策略,并大大减少目标离线数据的要求。只有适度的目标离线数据,我们的性能始终优于模拟和现实世界任务中先前的离线RL方法。

Offline reinforcement learning algorithms promise to be applicable in settings where a fixed dataset is available and no new experience can be acquired. However, such formulation is inevitably offline-data-hungry and, in practice, collecting a large offline dataset for one specific task over one specific environment is also costly and laborious. In this paper, we thus 1) formulate the offline dynamics adaptation by using (source) offline data collected from another dynamics to relax the requirement for the extensive (target) offline data, 2) characterize the dynamics shift problem in which prior offline methods do not scale well, and 3) derive a simple dynamics-aware reward augmentation (DARA) framework from both model-free and model-based offline settings. Specifically, DARA emphasizes learning from those source transition pairs that are adaptive for the target environment and mitigates the offline dynamics shift by characterizing state-action-next-state pairs instead of the typical state-action distribution sketched by prior offline RL methods. The experimental evaluation demonstrates that DARA, by augmenting rewards in the source offline dataset, can acquire an adaptive policy for the target environment and yet significantly reduce the requirement of target offline data. With only modest amounts of target offline data, our performance consistently outperforms the prior offline RL methods in both simulated and real-world tasks.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源