基于模型的强化学习与哈密顿的规范颂歌网络

论文标题

基于模型的强化学习与哈密顿的规范颂歌网络

Model-based Reinforcement Learning with a Hamiltonian Canonical ODE Network

论文作者

Feng, Yao, Jiang, Yuhong, Su, Hang, Yan, Dong, Zhu, Jun

论文摘要

基于模型的增强学习通常在训练世界模型的训练中具有很高的样本复杂性，尤其是对于具有复杂动态的环境而言。为了使对一般物理环境的培训更加有效，我们将哈密顿规范的普通微分方程引入学习过程，这激发了一种新颖的神经差分自动编码器（NODA）的新型模型。 Noda可以自然地对物理世界进行建模，并且可以灵活地施加哈密顿力学（例如，物理方程的维度），可以进一步加速对环境模型的训练。因此，它可以使用少量样本以及对物理合理性的保证来赋予RL试剂的能力。从理论上讲，我们证明NODA在某些条件下的多步过渡误差和价值误差具有统一的边界。广泛的实验表明，NODA可以通过较高的样本效率有效地学习环境动态，从而有可能在早期阶段促进加强学习剂。

Model-based reinforcement learning usually suffers from a high sample complexity in training the world model, especially for the environments with complex dynamics. To make the training for general physical environments more efficient, we introduce Hamiltonian canonical ordinary differential equations into the learning process, which inspires a novel model of neural ordinary differential auto-encoder (NODA). NODA can model the physical world by nature and is flexible to impose Hamiltonian mechanics (e.g., the dimension of the physical equations) which can further accelerate training of the environment models. It can consequentially empower an RL agent with the robust extrapolation using a small amount of samples as well as the guarantee on the physical plausibility. Theoretically, we prove that NODA has uniform bounds for multi-step transition errors and value errors under certain conditions. Extensive experiments show that NODA can learn the environment dynamics effectively with a high sample efficiency, making it possible to facilitate reinforcement learning agents at the early stage.

下载PDF全文

下载文献需遵守相关版权规定

论文标题