对抗性线性上下文匪徒的高效且健壮的算法

论文标题

对抗性线性上下文匪徒的高效且健壮的算法

Efficient and Robust Algorithms for Adversarial Linear Contextual Bandits

论文作者

Neu, Gergely, Olkhovskaya, Julia

论文摘要

我们考虑经典$ K $臂线性上下文匪徒问题的对抗变体，其中允许与每个手臂相关的损耗函数的顺序可以随着时间而限制而无限制。在假设$ d $维的上下文是从已知分布中随机生成的，我们基于经典EXP3算法开发了计算有效的算法。我们的第一个算法reallinexp3被证明可以实现$ \ widetilde {o}（\ sqrt {kdt}）$在$ t $ rounds上的遗憾保证，这匹配了对于此问题的最佳可用限制。 Our second algorithm, RobustLinExp3, is shown to be robust to misspecification, in that it achieves a regret bound of $\widetilde{O}((Kd)^{1/3}T^{2/3}) + \varepsilon \sqrt{d} T$ if the true reward function is linear up to an additive nonlinear error uniformly bounded in absolute value由$ \ varepsilon $。据我们所知，我们的绩效保证构成了此问题设置的第一个结果。

We consider an adversarial variant of the classic $K$-armed linear contextual bandit problem where the sequence of loss functions associated with each arm are allowed to change without restriction over time. Under the assumption that the $d$-dimensional contexts are generated i.i.d.~at random from a known distributions, we develop computationally efficient algorithms based on the classic Exp3 algorithm. Our first algorithm, RealLinExp3, is shown to achieve a regret guarantee of $\widetilde{O}(\sqrt{KdT})$ over $T$ rounds, which matches the best available bound for this problem. Our second algorithm, RobustLinExp3, is shown to be robust to misspecification, in that it achieves a regret bound of $\widetilde{O}((Kd)^{1/3}T^{2/3}) + \varepsilon \sqrt{d} T$ if the true reward function is linear up to an additive nonlinear error uniformly bounded in absolute value by $\varepsilon$. To our knowledge, our performance guarantees constitute the very first results on this problem setting.

下载PDF全文

下载文献需遵守相关版权规定

论文标题