论文标题
关于蒙特卡洛树搜索的有效并行化
On Effective Parallelization of Monte Carlo Tree Search
论文作者
论文摘要
尽管在GO和计算机游戏中取得了开创性的成功,但蒙特卡洛树搜索(MCT)在计算上还是昂贵的,因为它需要大量的推出来构建搜索树,这需要有效的并行化。但是,如何设计有效的平行MCT算法尚未系统地研究,并且仍然了解不足。在本文中,我们试图通过研究达到所需的加速时并行化引起的潜在性能损失来奠定其第一个理论基础。特别是,我们发现了实现理想的并行性能的必要条件,并突出了他们的两个实际好处。首先,通过检查现有的平行MCT算法是否满足这些条件,我们确定了应将未来算法继承的关键设计原理,例如跟踪未观察到的样品(在Wu-uct中使用(Liu等,2020))。从理论上讲,我们建立了这种必需的设计,有助于$ \ mathcal {o}(\ ln n n + m / \ sqrt {\ ln n})$累积的遗憾,当最大树深度为2时,其中$ n $是推出的数量,$ m $是工人的数量。与$ \ Mathcal {O}(\ ln n)$遗憾的是,对此形式的遗憾是非常可取的。其次,更重要的是,我们证明了如何采用提出的必要条件来设计更有效的平行MCT算法。为了说明这一点,我们通过遵循我们的理论准则提出了一种称为BU-uct的新的平行MCT算法。新提出的算法(尽管是初步),在15场Atari游戏中的11个中超过了4个竞争性基线。我们希望我们的理论结果能够激发更有效的平行MCT的未来工作。
Despite its groundbreaking success in Go and computer games, Monte Carlo Tree Search (MCTS) is computationally expensive as it requires a substantial number of rollouts to construct the search tree, which calls for effective parallelization. However, how to design effective parallel MCTS algorithms has not been systematically studied and remains poorly understood. In this paper, we seek to lay its first theoretical foundation, by examining the potential performance loss caused by parallelization when achieving a desired speedup. In particular, we discover the necessary conditions of achieving a desirable parallelization performance, and highlight two of their practical benefits. First, by examining whether existing parallel MCTS algorithms satisfy these conditions, we identify key design principles that should be inherited by future algorithms, for example tracking the unobserved samples (used in WU-UCT (Liu et al., 2020)). We theoretically establish this essential design facilitates $\mathcal{O} ( \ln n + M / \sqrt{\ln n} )$ cumulative regret when the maximum tree depth is 2, where $n$ is the number of rollouts and $M$ is the number of workers. A regret of this form is highly desirable, as compared to $\mathcal{O} ( \ln n )$ regret incurred by a sequential counterpart, its excess part approaches zero as $n$ increases. Second, and more importantly, we demonstrate how the proposed necessary conditions can be adopted to design more effective parallel MCTS algorithms. To illustrate this, we propose a new parallel MCTS algorithm, called BU-UCT, by following our theoretical guidelines. The newly proposed algorithm, albeit preliminary, out-performs four competitive baselines on 11 out of 15 Atari games. We hope our theoretical results could inspire future work of more effective parallel MCTS.