论文标题
蒙特卡洛树搜索的最佳计算预算分配树策略
An Optimal Computing Budget Allocation Tree Policy for Monte Carlo Tree Search
论文作者
论文摘要
我们通过基本的马尔可夫决策过程分析了树木搜索问题,在该过程中,目标是确定获得最高累积奖励的根部最佳动作。我们提出了一种新的树策略,该策略可以最佳地分配有限的计算预算,以最大程度地提高较低的限制,即正确选择每个节点上最佳动作的概率。与广泛使用的上置信度约束(UCB)树木政策相比,新的树政策提出了一种更加平衡的方法,可以在采样预算有限时管理勘探和剥削权衡取舍。此外,UCB假设已知奖励分布的支持,而我们的算法则放宽了这一假设。数值实验证明了我们算法在选择根部最佳作用方面的效率。
We analyze a tree search problem with an underlying Markov decision process, in which the goal is to identify the best action at the root that achieves the highest cumulative reward. We present a new tree policy that optimally allocates a limited computing budget to maximize a lower bound on the probability of correctly selecting the best action at each node. Compared to widely used Upper Confidence Bound (UCB) tree policies, the new tree policy presents a more balanced approach to manage the exploration and exploitation trade-off when the sampling budget is limited. Furthermore, UCB assumes that the support of reward distribution is known, whereas our algorithm relaxes this assumption. Numerical experiments demonstrate the efficiency of our algorithm in selecting the best action at the root.