CTRLFORMER：通过变压器学习可转移的状态表示以视觉控制

论文标题

CTRLFORMER：通过变压器学习可转移的状态表示以视觉控制

CtrlFormer: Learning Transferable State Representation for Visual Control via Transformer

论文作者

Mu, Yao, Chen, Shoufa, Ding, Mingyu, Chen, Jianyu, Chen, Runjian, Luo, Ping

论文摘要

Transformer在学习视觉和语言表示方面取得了巨大的成功，这在各种下游任务中都是一般的。在视觉控制中，可以在不同控制任务之间转移的可转移状态表示对于减少训练样本量很重要。但是，将变压器移植到样品高效的视觉控制仍然是一个具有挑战性且未解决的问题。为此，我们提出了一种新颖的控制变压器（CTRLFORMER），具有以前艺术所没有的许多吸引人的好处。首先，CTRLFORMER共同学习视觉令牌和政策令牌之间的自我发起机制，在不同的控制任务之间可以学习和转移多任务表示，而无需灾难性的遗忘。其次，我们仔细设计了一种对比的增强学习范式来训练Ctrlformer，从而使其能够达到高样本效率，这在控制问题中很重要。例如，在DMControl基准测试中，与最近的高级方法不同，该方法在使用100K样本转移学习后通过在“ Cartpole”任务中产生零分数而失败，CTRLFORMER可以在维持先前任务的性能的同时，仅使用100K样本获得最先进的分数。代码和模型在我们的项目主页中发布。

Transformer has achieved great successes in learning vision and language representation, which is general across various downstream tasks. In visual control, learning transferable state representation that can transfer between different control tasks is important to reduce the training sample size. However, porting Transformer to sample-efficient visual control remains a challenging and unsolved problem. To this end, we propose a novel Control Transformer (CtrlFormer), possessing many appealing benefits that prior arts do not have. Firstly, CtrlFormer jointly learns self-attention mechanisms between visual tokens and policy tokens among different control tasks, where multitask representation can be learned and transferred without catastrophic forgetting. Secondly, we carefully design a contrastive reinforcement learning paradigm to train CtrlFormer, enabling it to achieve high sample efficiency, which is important in control problems. For example, in the DMControl benchmark, unlike recent advanced methods that failed by producing a zero score in the "Cartpole" task after transfer learning with 100k samples, CtrlFormer can achieve a state-of-the-art score with only 100k samples while maintaining the performance of previous tasks. The code and models are released in our project homepage.

下载PDF全文

下载文献需遵守相关版权规定

论文标题