关于知识梯度算法的有限时间性能

论文标题

关于知识梯度算法的有限时间性能

On the Finite-Time Performance of the Knowledge Gradient Algorithm

论文作者

Li, Yanwen, Gao, Siyang

论文摘要

知识梯度（KG）算法是最佳手臂识别（BAI）问题的流行且有效的算法。由于KG的复杂计算，该算法的理论分析很困难，现有结果主要是关于IT的渐近性能，例如一致性，渐近样本分配等。在这项研究中，我们介绍了有关KG算法的有限时间性能的新理论结果。在独立和正常分布的奖励下，我们得出了算法分配的界限。通过这些界限，现有的渐近结果变成了简单的推论。此外，我们为算法的错误概率和简单的遗憾而得出上限和下限，并显示了多臂匪徒（MAB）问题的算法的性能。这些发展不仅扩展了KG算法的现有分析，而且还可以用于分析其他基于改进的算法。最后，我们使用数值实验比较我们得出的边界和KG算法的性能。

The knowledge gradient (KG) algorithm is a popular and effective algorithm for the best arm identification (BAI) problem. Due to the complex calculation of KG, theoretical analysis of this algorithm is difficult, and existing results are mostly about the asymptotic performance of it, e.g., consistency, asymptotic sample allocation, etc. In this research, we present new theoretical results about the finite-time performance of the KG algorithm. Under independent and normally distributed rewards, we derive bounds for the sample allocation of the algorithm. With these bounds, existing asymptotic results become simple corollaries. Furthermore, we derive upper and lower bounds for the probability of error and simple regret of the algorithm, and show the performance of the algorithm for the multi-armed bandit (MAB) problem. These developments not only extend the existing analysis of the KG algorithm, but can also be used to analyze other improvement-based algorithms. Last, we use numerical experiments to compare the bounds we derive and the performance of the KG algorithm.

下载PDF全文

下载文献需遵守相关版权规定

论文标题