论文标题
极端匪徒的有效算法
Efficient Algorithms for Extreme Bandits
论文作者
论文摘要
在本文中,我们为极端的强盗问题做出了贡献,这是一个多军匪徒的变体,其中学习者试图收集最大的奖励。我们首先研究了在奖励分布的尾部轻度假设下的I.I.D随机变量的最大浓度。该分析激发了Maxima(QOMAX)分位数的引入。 QOMAX的特性足以构建探索(ETC)策略,即QOMAX-ETC,尽管它很简单,但仍能实现强大的渐近保证。然后,我们提出并分析了一种更适合自适应的算法,即QOMAX-SDA,该算法将QOMAX与Baudry等人最近引入的二亚采样方法结合在一起。 (2021)。两种算法在两个方面都比现有方法更有效(1)它们可以提高经验表现更好(2)它们大大降低了记忆和时间复杂性。
In this paper, we contribute to the Extreme Bandit problem, a variant of Multi-Armed Bandits in which the learner seeks to collect the largest possible reward. We first study the concentration of the maximum of i.i.d random variables under mild assumptions on the tail of the rewards distributions. This analysis motivates the introduction of Quantile of Maxima (QoMax). The properties of QoMax are sufficient to build an Explore-Then-Commit (ETC) strategy, QoMax-ETC, achieving strong asymptotic guarantees despite its simplicity. We then propose and analyze a more adaptive, anytime algorithm, QoMax-SDA, which combines QoMax with a subsampling method recently introduced by Baudry et al. (2021). Both algorithms are more efficient than existing approaches in two aspects (1) they lead to better empirical performance (2) they enjoy a significant reduction of the memory and time complexities.