遗憾的是对非固定的多军匪徒，并有公平的限制

论文标题

遗憾的是对非固定的多军匪徒，并有公平的限制

A Regret bound for Non-stationary Multi-Armed Bandits with Fairness Constraints

论文作者

R, Shaarad A., Dukkipati, Ambedkar

论文摘要

多臂匪徒的框架是研究顺序决策问题策略的最常见平台。最近，公平概念在机器学习社区引起了很多关注。人们可以强加公平条件，即在任何给定时间点，即使在学习阶段，表现不佳的候选人也不应比更好的候选人更喜欢。已知这种公平的约束是最严格的限制之一，并且已经在随机多军匪徒的框架中进行了研究，以建立后悔的界限。本文的主要目的是在非固定环境中研究此问题。我们提出了一种新的算法，称为“公平上限信心”，该算法与探索fair-ucbe算法有关，以解决缓慢变化的随机$ k $ armed bandit问题。有了这一点，我们提出了两个结果：（i）Fair -ucbe确实满足了上述公平条件，并且（ii）遗憾的是$ o \ left（k^{\ frac {3} {2} {2}} {2}} t^{1 - \fracα{2}}}}}}} \ sqrt \ sqrt t} $ t $是时间范围。这是第一个公平的算法，符合我们的最大知识，其均属遗憾的束缚。我们表明，在非平稳案例中，我们的算法的性能将其静止的对应物的变化趋于零。

The multi-armed bandits' framework is the most common platform to study strategies for sequential decision-making problems. Recently, the notion of fairness has attracted a lot of attention in the machine learning community. One can impose the fairness condition that at any given point of time, even during the learning phase, a poorly performing candidate should not be preferred over a better candidate. This fairness constraint is known to be one of the most stringent and has been studied in the stochastic multi-armed bandits' framework in a stationary setting for which regret bounds have been established. The main aim of this paper is to study this problem in a non-stationary setting. We present a new algorithm called Fair Upper Confidence Bound with Exploration Fair-UCBe algorithm for solving a slowly varying stochastic $k$-armed bandit problem. With this we present two results: (i) Fair-UCBe indeed satisfies the above mentioned fairness condition, and (ii) it achieves a regret bound of $O\left(k^{\frac{3}{2}} T^{1 - \fracα{2}} \sqrt{\log T}\right)$, for some suitable $α\in (0, 1)$, where $T$ is the time horizon. This is the first fair algorithm with a sublinear regret bound applicable to non-stationary bandits to the best of our knowledge. We show that the performance of our algorithm in the non-stationary case approaches that of its stationary counterpart as the variation in the environment tends to zero.

下载PDF全文

下载文献需遵守相关版权规定

论文标题