线性随机土匪的奖励偏向最大似然估计

论文标题

线性随机土匪的奖励偏向最大似然估计

Reward-Biased Maximum Likelihood Estimation for Linear Stochastic Bandits

论文作者

Hung, Yu-Heng, Hsieh, Ping-Chun, Liu, Xi, Kumar, P. R.

论文摘要

修改最初在自适应控制文献中提出的奖励偏见的最大似然方法，我们提出了新颖的学习算法来处理线性匪徒问题中的探索探索探索折衷权以及广义线性匪徒问题。我们制定了新的指数策略，这些政策证明了实现秩序的优势，并证明它们在广泛的实验中使用最先进的基准方法实现了经验性绩效竞争。新的政策通过线性匪徒每次拉动时间较低，从而实现这一目标，从而导致了有利的遗憾和计算效率。

Modifying the reward-biased maximum likelihood method originally proposed in the adaptive control literature, we propose novel learning algorithms to handle the explore-exploit trade-off in linear bandits problems as well as generalized linear bandits problems. We develop novel index policies that we prove achieve order-optimality, and show that they achieve empirical performance competitive with the state-of-the-art benchmark methods in extensive experiments. The new policies achieve this with low computation time per pull for linear bandits, and thereby resulting in both favorable regret as well as computational efficiency.

下载PDF全文

下载文献需遵守相关版权规定

论文标题