论文标题
部分可观测时空混沌系统的无模型预测
Continuous Monte Carlo Graph Search
论文作者
论文摘要
在线计划对于许多复杂的顺序决策任务中的高性能至关重要。蒙特卡洛树搜索(MCTS)采用了一种原则机制来交易探索以进行有效的在线计划,并且在许多离散的决策领域(例如GO,国际象棋和Shogi)中的比较方法优于比较方法。随后,已经开发了MCT向连续域的扩展。但是,固有的高分支因子和搜索树大小的爆炸量限制了现有方法。为了解决这个问题,我们提出了连续的蒙特卡洛图搜索(CMCGS),这是在具有连续状态和动作空间的环境中MCT扩展到在线计划的。 CMCG利用了在计划期间分享几个州之间相同行动政策的见解,可以产生高性能。为了实现此想法,在每个时间步骤中,CMCGS簇相似的状态簇成有限数量的随机动作匪徒节点,该节点会产生分层的有向图,而不是MCTS搜索树。实验评估表明,在几种复杂的连续DeepMind Control Suite基准和2D导航和勘探任务中,CMCG的表现优于可比较的计划方法,样品预算有限。此外,CMCG可以通过并行化来缩放,并且与学习动力学模型连续控制中的交叉凝聚法(CEM)的表现。
Online planning is crucial for high performance in many complex sequential decision-making tasks. Monte Carlo Tree Search (MCTS) employs a principled mechanism for trading off exploration for exploitation for efficient online planning, and it outperforms comparison methods in many discrete decision-making domains such as Go, Chess, and Shogi. Subsequently, extensions of MCTS to continuous domains have been developed. However, the inherent high branching factor and the resulting explosion of the search tree size are limiting the existing methods. To address this problem, we propose Continuous Monte Carlo Graph Search (CMCGS), an extension of MCTS to online planning in environments with continuous state and action spaces. CMCGS takes advantage of the insight that, during planning, sharing the same action policy between several states can yield high performance. To implement this idea, at each time step, CMCGS clusters similar states into a limited number of stochastic action bandit nodes, which produce a layered directed graph instead of an MCTS search tree. Experimental evaluation shows that CMCGS outperforms comparable planning methods in several complex continuous DeepMind Control Suite benchmarks and 2D navigation and exploration tasks with limited sample budgets. Furthermore, CMCGS can be scaled up through parallelization, and it outperforms the Cross-Entropy Method (CEM) in continuous control with learned dynamics models.