遗忘的广义线性匪徒的自信分析

论文标题

遗忘的广义线性匪徒的自信分析

Self-Concordant Analysis of Generalized Linear Bandits with Forgetting

论文作者

Russac, Yoan, Faury, Louis, Cappé, Olivier, Garivier, Aurélien

论文摘要

上下文的顺序决策问题具有分类或数值观察的是无处不在的广义线性匪徒（GLB），提供了一个可靠的理论框架来解决它们。与线性匪徒的情况相反，现有的GLB算法有两个缺点破坏了其适用性。首先，由于模型的非线性性质，它们依赖于过度悲观的浓度界限。其次，他们需要非凸投影步骤或燃烧阶段才能实施估计器的界限。在考虑非平稳模型时，这两个问题都会恶化，其中GLB参数可能随时间而变化。在这项工作中，我们专注于自我符合的GLB（包括Logistic和Poisson回归），忘记了通过使用滑动窗口或指数重量来实现的。我们提出了一种基于置信度的新型算法，用于遗忘和分析其在突然变化的环境中的最大样子估计量。这些结果以及随附的数值模拟突出了提出的方法解决GLB中非平稳性的潜力。

Contextual sequential decision problems with categorical or numerical observations are ubiquitous and Generalized Linear Bandits (GLB) offer a solid theoretical framework to address them. In contrast to the case of linear bandits, existing algorithms for GLB have two drawbacks undermining their applicability. First, they rely on excessively pessimistic concentration bounds due to the non-linear nature of the model. Second, they require either non-convex projection steps or burn-in phases to enforce boundedness of the estimators. Both of these issues are worsened when considering non-stationary models, in which the GLB parameter may vary with time. In this work, we focus on self-concordant GLB (which include logistic and Poisson regression) with forgetting achieved either by the use of a sliding window or exponential weights. We propose a novel confidence-based algorithm for the maximum-likehood estimator with forgetting and analyze its perfomance in abruptly changing environments. These results as well as the accompanying numerical simulations highlight the potential of the proposed approach to address non-stationarity in GLB.

下载PDF全文

下载文献需遵守相关版权规定

论文标题