ISO-Dream：在世界模型中隔离和利用不可控制的视觉动态

论文标题

ISO-Dream：在世界模型中隔离和利用不可控制的视觉动态

Iso-Dream: Isolating and Leveraging Noncontrollable Visual Dynamics in World Models

论文作者

Pan, Minting, Zhu, Xiangming, Wang, Yunbo, Yang, Xiaokang

论文摘要

世界模型学习基于视觉的交互式系统中动作的后果。但是，在诸如自动驾驶之类的实际情况下，通常存在独立于动作信号的不可控制的动态，因此很难学习有效的世界模型。为了解决这个问题，我们提出了一种新颖的增强学习方法，名为Iso-Dream，该方法在两个方面改善了梦境到控制框架。首先，通过优化逆动力学，我们鼓励世界模型在隔离状态过渡分支上学习可控和不可控制的时空变化来源。其次，我们优化了代理在世界模型的潜在想象中的行为。具体而言，为了估算状态值，我们将不可控制状态推出到将来，并将其与当前可控状态相关联。这样，动态来源的隔离可以极大地使代理商的长途决策受益，例如一种自动驾驶汽车，可以通过预测其他车辆的运动来避免潜在的风险。实验表明，ISO-Dream可以有效地解耦混合动力学，并且在广泛的视觉控制和预测域中明显优于现有方法。

World models learn the consequences of actions in vision-based interactive systems. However, in practical scenarios such as autonomous driving, there commonly exists noncontrollable dynamics independent of the action signals, making it difficult to learn effective world models. To tackle this problem, we present a novel reinforcement learning approach named Iso-Dream, which improves the Dream-to-Control framework in two aspects. First, by optimizing the inverse dynamics, we encourage the world model to learn controllable and noncontrollable sources of spatiotemporal changes on isolated state transition branches. Second, we optimize the behavior of the agent on the decoupled latent imaginations of the world model. Specifically, to estimate state values, we roll-out the noncontrollable states into the future and associate them with the current controllable state. In this way, the isolation of dynamics sources can greatly benefit long-horizon decision-making of the agent, such as a self-driving car that can avoid potential risks by anticipating the movement of other vehicles. Experiments show that Iso-Dream is effective in decoupling the mixed dynamics and remarkably outperforms existing approaches in a wide range of visual control and prediction domains.

下载PDF全文

下载文献需遵守相关版权规定

论文标题