论文标题

基于高维模型的增强学习的典型上下文感知动力学概括

Prototypical context-aware dynamics generalization for high-dimensional model-based reinforcement learning

论文作者

Wang, Junjie, Mu, Yao, Li, Dong, Zhang, Qichao, Zhao, Dongbin, Zhuang, Yuzheng, Luo, Ping, Wang, Bin, Hao, Jianye

论文摘要

潜在的世界模型提供了一种有希望的方法,可以在紧凑的潜在空间中学习具有高维度观察的任务的政策,但是,其在不同环境中具有看不见动态的不同环境的概括仍然具有挑战性。尽管当前进步中使用的经常性结构有助于捕获本地动力学,但仅对状态转变进行建模,而无需明确了解环境环境限制了动力学模型的概括能力。为了解决这个问题,我们提出了一个典型的上下文感知动力学(ProtoCAD)模型,该模型通过时间一致的潜在上下文捕获局部动态,并在高维控制任务中启用动态概括。 ProtoCAD提取在两个折叠中聚集在批处理上的原型和基于益处的RL的原型的有用上下文信息:1)它利用了一个时间一致的原型正规剂,该原型原型正规剂鼓励针对同一潜在轨迹的不同时间部分产生的原型分配,以便在时间上保持时间一致,而不是比较特征; 2)设计上下文表示,将潜在状态的投影嵌入和汇总的原型结合在一起,并且可以显着提高动力学概括能力。广泛的实验表明,在动态概括方面,原始实验超过了现有的方法。与基于循环的模型RSSM相比,ProtoCAD在所有动态概括任务中提供了13.2%和26.7%的平均表现和中位性能。

The latent world model provides a promising way to learn policies in a compact latent space for tasks with high-dimensional observations, however, its generalization across diverse environments with unseen dynamics remains challenging. Although the recurrent structure utilized in current advances helps to capture local dynamics, modeling only state transitions without an explicit understanding of environmental context limits the generalization ability of the dynamics model. To address this issue, we propose a Prototypical Context-Aware Dynamics (ProtoCAD) model, which captures the local dynamics by time consistent latent context and enables dynamics generalization in high-dimensional control tasks. ProtoCAD extracts useful contextual information with the help of the prototypes clustered over batch and benefits model-based RL in two folds: 1) It utilizes a temporally consistent prototypical regularizer that encourages the prototype assignments produced for different time parts of the same latent trajectory to be temporally consistent instead of comparing the features; 2) A context representation is designed which combines both the projection embedding of latent states and aggregated prototypes and can significantly improve the dynamics generalization ability. Extensive experiments show that ProtoCAD surpasses existing methods in terms of dynamics generalization. Compared with the recurrent-based model RSSM, ProtoCAD delivers 13.2% and 26.7% better mean and median performance across all dynamics generalization tasks.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源