MCT中计算的静态和动态值

论文标题

MCT中计算的静态和动态值

Static and Dynamic Values of Computation in MCTS

论文作者

Sezener, Eren, Dayan, Peter

论文摘要

蒙特卡罗树搜索（MCT）是最具范围的计划方法之一，并为人工智能的许多最新进展提供了动力。在MCT中，通常会执行计算（即模拟）来收集有关动作可能后果的统计信息，然后选择相应选择。 UCT及其变体等许多流行的MCT方法决定通过交易探索和剥削执行哪些计算。在这项工作中，我们采用了更直接的方法，并根据其对最终选择的动作质量的预期影响明确量化计算的价值。我们的方法超出了两种感觉的现有基于计算值的方法的“近视”局限性：（i）我们能够说明非IMMediate（即未来）计算（ii）对非IMMedMediate动作的影响。我们表明，在某些假设下，贪婪地优化计算值的策略是最佳的，并获得与最先进的竞争的结果。

Monte-Carlo Tree Search (MCTS) is one of the most-widely used methods for planning, and has powered many recent advances in artificial intelligence. In MCTS, one typically performs computations (i.e., simulations) to collect statistics about the possible future consequences of actions, and then chooses accordingly. Many popular MCTS methods such as UCT and its variants decide which computations to perform by trading-off exploration and exploitation. In this work, we take a more direct approach, and explicitly quantify the value of a computation based on its expected impact on the quality of the action eventually chosen. Our approach goes beyond the "myopic" limitations of existing computation-value-based methods in two senses: (I) we are able to account for the impact of non-immediate (ie, future) computations (II) on non-immediate actions. We show that policies that greedily optimize computation values are optimal under certain assumptions and obtain results that are competitive with the state-of-the-art.

下载PDF全文

下载文献需遵守相关版权规定

论文标题