论文标题
与随机线性匪徒的元学习
Meta-learning with Stochastic Linear Bandits
论文作者
论文摘要
我们在随机线性匪徒任务的设置中研究了元学习程序。目的是选择一种学习算法,该算法平均在一系列的土匪任务中效果很好,这些任务是从任务分布中取样的。受到学习到学习与学习线性回归的最新研究的启发,我们考虑了一类Bandit算法,该算法实现了众所周知的Oful算法的正则化版本,在该算法中,正则化是正规化的欧几里得距离与偏置向量的距离。我们首先从遗憾的最小化方面研究了偏见的算法的好处。然后,我们提出了两种策略来估计学习对学习环境中的偏见。我们在理论上还是在实验上都表明,当任务数量增加并且任务分布的差异很小时,我们的策略在孤立学习任务方面具有重要优势。
We investigate meta-learning procedures in the setting of stochastic linear bandits tasks. The goal is to select a learning algorithm which works well on average over a class of bandits tasks, that are sampled from a task-distribution. Inspired by recent work on learning-to-learn linear regression, we consider a class of bandit algorithms that implement a regularized version of the well-known OFUL algorithm, where the regularization is a square euclidean distance to a bias vector. We first study the benefit of the biased OFUL algorithm in terms of regret minimization. We then propose two strategies to estimate the bias within the learning-to-learn setting. We show both theoretically and experimentally, that when the number of tasks grows and the variance of the task-distribution is small, our strategies have a significant advantage over learning the tasks in isolation.