实例的最小值算法的逻辑匪徒

论文标题

实例的最小值算法的逻辑匪徒

Instance-Wise Minimax-Optimal Algorithms for Logistic Bandits

论文作者

Abeille, Marc, Faury, Louis, Calauzènes, Clément

论文摘要

Logistic Bandits最近通过提供一个整洁而又具有挑战性的框架来理解非线性在参数化匪徒中的影响，引起了大量关注。 Faury等人表明。（2020）logistic匪徒的学习理论困难可以由大的（有时是极其依赖）问题依赖性常数$κ$体现，表征了奖励的非线性的幅度。在本文中，我们介绍了一种新型算法，我们为其提供了精致的分析。这可以更好地表征非线性的影响，并产生改善的问题依赖性保证。在大多数有利的情况下，这会导致遗憾的上限缩放为$ \ tilde {\ Mathcal {o}}}（d \ sqrt {t/κ}）$，在$ \ tilde {\ tillcal {o}}}}（d \ sqrt sqrt paranive {\ mathcal {o）上均大大改善了。我们证明，通过得出$ω（d \ sqrt {t/κ}）$问题依赖性下限来证明该速率是最小的。我们的分析确定了遗憾的两个制度（永久性和临时），最终重新征服了Faury等人。（2020）与Dong等人的贝叶斯方法。（2019）。与以前的作品相反，我们发现在永久性政权中，非线性可以极大地减轻探索探索的权衡。尽管它也以问题依赖性的方式影响了暂时性阶段的长度，但我们表明，在最合理的配置中，这种影响是温和的。

Logistic Bandits have recently attracted substantial attention, by providing an uncluttered yet challenging framework for understanding the impact of non-linearity in parametrized bandits. It was shown by Faury et al. (2020) that the learning-theoretic difficulties of Logistic Bandits can be embodied by a large (sometimes prohibitively) problem-dependent constant $κ$, characterizing the magnitude of the reward's non-linearity. In this paper we introduce a novel algorithm for which we provide a refined analysis. This allows for a better characterization of the effect of non-linearity and yields improved problem-dependent guarantees. In most favorable cases this leads to a regret upper-bound scaling as $\tilde{\mathcal{O}}(d\sqrt{T/κ})$, which dramatically improves over the $\tilde{\mathcal{O}}(d\sqrt{T}+κ)$ state-of-the-art guarantees. We prove that this rate is minimax-optimal by deriving a $Ω(d\sqrt{T/κ})$ problem-dependent lower-bound. Our analysis identifies two regimes (permanent and transitory) of the regret, which ultimately re-conciliates Faury et al. (2020) with the Bayesian approach of Dong et al. (2019). In contrast to previous works, we find that in the permanent regime non-linearity can dramatically ease the exploration-exploitation trade-off. While it also impacts the length of the transitory phase in a problem-dependent fashion, we show that this impact is mild in most reasonable configurations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题