论文标题

部分可观测时空混沌系统的无模型预测

The price of unfairness in linear bandits with biased feedback

论文作者

Gaucher, Solenne, Carpentier, Alexandra, Giraud, Christophe

论文摘要

在本文中,我们研究了有偏见的线性匪徒反馈的合理顺序决策问题。在每个回合中,一个玩家选择协变量描述的动作和敏感属性。感知到的奖励是所选动作的协变量的线性组合,但玩家仅观察到对该奖励的有偏见的评估,具体取决于敏感属性。为了表征这个问题的难度,我们设计了一种分阶段的消除算法,以纠正不公平的评估,并在其后悔中建立上限。我们表明,最糟糕的遗憾小于$ \ MATHCAL {O}(κ_*^{1/3} \ log(t)^{1/3} t^{2/3})$,其中$κ_*$是明显的地理位置代表偏见估计难度的明显的差异。对于某些行动,我们证明了最严重的遗憾,这表明该速度紧张到可能的亚属性因素。我们还在遗憾的是依赖GAP依赖的上限,并在某些问题实例中匹配下限。这些结果揭示了问题在问题与其无偏见的方面一样困难的制度和一个可能更难的政权之间的过渡。

In this paper, we study the problem of fair sequential decision making with biased linear bandit feedback. At each round, a player selects an action described by a covariate and by a sensitive attribute. The perceived reward is a linear combination of the covariates of the chosen action, but the player only observes a biased evaluation of this reward, depending on the sensitive attribute. To characterize the difficulty of this problem, we design a phased elimination algorithm that corrects the unfair evaluations, and establish upper bounds on its regret. We show that the worst-case regret is smaller than $\mathcal{O}(κ_*^{1/3}\log(T)^{1/3}T^{2/3})$, where $κ_*$ is an explicit geometrical constant characterizing the difficulty of bias estimation. We prove lower bounds on the worst-case regret for some sets of actions showing that this rate is tight up to a possible sub-logarithmic factor. We also derive gap-dependent upper bounds on the regret, and matching lower bounds for some problem instance.Interestingly, these results reveal a transition between a regime where the problem is as difficult as its unbiased counterpart, and a regime where it can be much harder.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源