论文标题

关于对抗决策的复杂性

On the Complexity of Adversarial Decision Making

论文作者

Foster, Dylan J., Rakhlin, Alexander, Sekhari, Ayush, Sridharan, Karthik

论文摘要

在线学习和决策中的一个核心问题 - 从土匪到强化学习 - 是要了解哪种建模假设会导致样本有效的学习保证。我们考虑一个一般的对抗性决策框架,其中包含(结构化的)匪徒与对抗性动态有关的对抗奖励和强化学习问题。我们的主要结果是通过新的上限和下限显示决策估计系数,这是Foster等人引入的复杂度度量。在与我们环境的随机对应物中,对于对抗性决策而言是必要和足够的遗憾。但是,与随机设置相比,必须将决策估计系数应用于所考虑的模型类(或假设)的凸壳。这就确定了容纳对抗性奖励或动态的价格受凸层化模型类的行为的约束,并恢复了许多现有结果 - 既积极和负面。在获得这些保证的途径中,我们提供了新的结构结果,这些结果将决策估计系数与其他众所周知的复杂性度量的变体联系起来,包括Russo和Van Roy的信息比以及Lattimore和György的探索目标。

A central problem in online learning and decision making -- from bandits to reinforcement learning -- is to understand what modeling assumptions lead to sample-efficient learning guarantees. We consider a general adversarial decision making framework that encompasses (structured) bandit problems with adversarial rewards and reinforcement learning problems with adversarial dynamics. Our main result is to show -- via new upper and lower bounds -- that the Decision-Estimation Coefficient, a complexity measure introduced by Foster et al. in the stochastic counterpart to our setting, is necessary and sufficient to obtain low regret for adversarial decision making. However, compared to the stochastic setting, one must apply the Decision-Estimation Coefficient to the convex hull of the class of models (or, hypotheses) under consideration. This establishes that the price of accommodating adversarial rewards or dynamics is governed by the behavior of the model class under convexification, and recovers a number of existing results -- both positive and negative. En route to obtaining these guarantees, we provide new structural results that connect the Decision-Estimation Coefficient to variants of other well-known complexity measures, including the Information Ratio of Russo and Van Roy and the Exploration-by-Optimization objective of Lattimore and György.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源