分解的共同信息优化用于元强化学习中的广义上下文

论文标题

分解的共同信息优化用于元强化学习中的广义上下文

Decomposed Mutual Information Optimization for Generalized Context in Meta-Reinforcement Learning

论文作者

Mu, Yao, Zhuang, Yuzheng, Ni, Fei, Wang, Bin, Chen, Jianyu, Hao, Jianye, Luo, Ping

论文摘要

适应过渡动力学的变化对于机器人应用至关重要。通过学习具有紧凑型上下文的有条件策略，上下文感知的荟萃方面学习为根据动态变化提供了一种灵活的方式来调整行为。但是，在实际应用程序中，代理可能会遇到复杂的动态变化。多个混杂因素可以影响过渡动态，从而具有挑战性地推断出决策的准确背景。本文通过分解上下文学习的分解共同信息优化（Domino）来解决这一挑战，该信息明确学习了一个分离的上下文，以最大程度地提高上下文和历史轨迹之间的相互信息，同时最大程度地减少状态过渡预测错误。我们的理论分析表明，多米诺骨牌可以通过学习分散的上下文来克服由多方面挑战引起的相互信息的低估，并减少对在各种环境中收集的样本数量的需求。广泛的实验表明，多米诺骨牌学到的上下文受益于基于模型和无模型的增强学习算法，以在未见环境中的样本效率和性能方面进行动态概括。

Adapting to the changes in transition dynamics is essential in robotic applications. By learning a conditional policy with a compact context, context-aware meta-reinforcement learning provides a flexible way to adjust behavior according to dynamics changes. However, in real-world applications, the agent may encounter complex dynamics changes. Multiple confounders can influence the transition dynamics, making it challenging to infer accurate context for decision-making. This paper addresses such a challenge by Decomposed Mutual INformation Optimization (DOMINO) for context learning, which explicitly learns a disentangled context to maximize the mutual information between the context and historical trajectories, while minimizing the state transition prediction error. Our theoretical analysis shows that DOMINO can overcome the underestimation of the mutual information caused by multi-confounded challenges via learning disentangled context and reduce the demand for the number of samples collected in various environments. Extensive experiments show that the context learned by DOMINO benefits both model-based and model-free reinforcement learning algorithms for dynamics generalization in terms of sample efficiency and performance in unseen environments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题