遗憾的是，通过表演反馈最小化

论文标题

遗憾的是，通过表演反馈最小化

Regret Minimization with Performative Feedback

论文作者

Jagadeesan, Meena, Zrnic, Tijana, Mendler-Dünner, Celestine

论文摘要

在表现性预测中，预测模型的部署触发了数据分布的变化。由于这些转变通常是未知的，因此学习者需要部署模型以获取有关其引起的分布的反馈。我们研究了在性能下发现近乎最佳模型的问题，同时保持低廉的遗憾。从表面上看，这个问题似乎等同于强盗问题。但是，它表现出从根本上说的反馈结构，我们将其称为表演性反馈：在每次部署后，学习者都会从转移的分布中收到样本，而不仅仅是关于奖励的强盗反馈。我们的主要贡献是一种算法，该算法仅随着分布的复杂性而不是奖励功能的复杂性而实现遗憾的范围。该算法仅依赖于移位的平滑度，并且不假定凸度。此外，它的最终迭代保证是近乎最佳的。关键的算法思想是对分布变化的仔细探索，该分布变化为新颖的置信度构建而言是未开发模型的风险。从更广泛的角度来看，我们的工作为从土匪文献中利用工具建立了一种概念性方法，目的是通过表演性反馈最小化遗憾。

In performative prediction, the deployment of a predictive model triggers a shift in the data distribution. As these shifts are typically unknown ahead of time, the learner needs to deploy a model to get feedback about the distribution it induces. We study the problem of finding near-optimal models under performativity while maintaining low regret. On the surface, this problem might seem equivalent to a bandit problem. However, it exhibits a fundamentally richer feedback structure that we refer to as performative feedback: after every deployment, the learner receives samples from the shifted distribution rather than only bandit feedback about the reward. Our main contribution is an algorithm that achieves regret bounds scaling only with the complexity of the distribution shifts and not that of the reward function. The algorithm only relies on smoothness of the shifts and does not assume convexity. Moreover, its final iterate is guaranteed to be near-optimal. The key algorithmic idea is careful exploration of the distribution shifts that informs a novel construction of confidence bounds on the risk of unexplored models. More broadly, our work establishes a conceptual approach for leveraging tools from the bandits literature for the purpose of regret minimization with performative feedback.

下载PDF全文

下载文献需遵守相关版权规定

论文标题