超越UCB：带回归甲骨文的最佳有效上下文匪徒

论文标题

超越UCB：带回归甲骨文的最佳有效上下文匪徒

Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles

论文作者

Foster, Dylan J., Rakhlin, Alexander

论文摘要

上下文匪徒的一个基本挑战是开发具有计算要求的灵活的通用算法，没有比经典监督的学习任务（例如分类和回归）更糟糕的。基于回归的算法已经显示出有希望的经验成功，但是在特殊情况下，理论保证仍然难以捉摸。我们提供了从上下文土匪到在线回归的第一个通用和最佳减少。我们展示了如何将带有给定值函数类别的在线回归的任何Oracle转换为具有诱导策略类别的上下文匪徒的算法，而在运行时或内存需求中没有开销。我们表征具有一般，潜在的非参数函数类别的上下文匪徒的最小值率，并表明每当Oracle获得回归的最佳速率时，我们的算法是最小值的最佳。与以前的结果相比，我们的算法不需要以外的分布假设，即使选择上下文是对手的。

A fundamental challenge in contextual bandits is to develop flexible, general-purpose algorithms with computational requirements no worse than classical supervised learning tasks such as classification and regression. Algorithms based on regression have shown promising empirical success, but theoretical guarantees have remained elusive except in special cases. We provide the first universal and optimal reduction from contextual bandits to online regression. We show how to transform any oracle for online regression with a given value function class into an algorithm for contextual bandits with the induced policy class, with no overhead in runtime or memory requirements. We characterize the minimax rates for contextual bandits with general, potentially nonparametric function classes, and show that our algorithm is minimax optimal whenever the oracle obtains the optimal rate for regression. Compared to previous results, our algorithm requires no distributional assumptions beyond realizability, and works even when contexts are chosen adversarially.

下载PDF全文

下载文献需遵守相关版权规定

论文标题