论文标题
不安的土匪的索引和O(k^3)算法计算晶状索引的条件
Conditions for indexability of restless bandits and an O(K^3) algorithm to compute Whittle index
论文作者
论文摘要
焦躁不安的土匪是一类顺序资源分配问题,涉及在该过程的演变取决于分配给它们的资源的几个替代过程中分配一个或多个资源。这样的模型捕获了探索和剥削之间的基本权衡。 1988年,惠特尔(Whittle)为不安的匪徒问题开发了一种索引启发式方法,由于其简单性和强大的经验表现,该方法已成为一种流行的解决方案方法。如果该模型满足称为索引性的技术条件,则适用Whittle Index启发式。在本文中,我们提出了两种一般足够的条件,可索引,并确定更简单的条件以验证这些条件的改进。然后,我们重新访问一种先前提出的称为自适应贪婪算法的算法,该算法已知可以计算不安的土匪子类的晶状索引。我们表明,自适应贪婪算法的概括计算所有可索引的不安匪徒的鞭子索引。我们提出了该算法的有效实现,该算法可以计算O(k^3)计算中的$ k $状态的不安强盗的晶状索引。最后,我们提出了一项详细的数值研究,该研究肯定了Whittle Index启发式的强劲表现。
Restless bandits are a class of sequential resource allocation problems concerned with allocating one or more resources among several alternative processes where the evolution of the process depends on the resource allocated to them. Such models capture the fundamental trade-offs between exploration and exploitation. In 1988, Whittle developed an index heuristic for restless bandit problems which has emerged as a popular solution approach due to its simplicity and strong empirical performance. The Whittle index heuristic is applicable if the model satisfies a technical condition known as indexability. In this paper, we present two general sufficient conditions for indexability and identify simpler to verify refinements of these conditions. We then revisit a previously proposed algorithm called adaptive greedy algorithm which is known to compute the Whittle index for a subclass of restless bandits. We show that a generalization of the adaptive greedy algorithm computes the Whittle index for all indexable restless bandits. We present an efficient implementation of this algorithm which can compute the Whittle index of a restless bandit with $K$ states in O(K^3) computations. Finally, we present a detailed numerical study which affirms the strong performance of the Whittle index heuristic.