与随机线性匪徒的元学习

论文标题

与随机线性匪徒的元学习

Meta-learning with Stochastic Linear Bandits

论文作者

Cella, Leonardo, Lazaric, Alessandro, Pontil, Massimiliano

论文摘要

我们在随机线性匪徒任务的设置中研究了元学习程序。目的是选择一种学习算法，该算法平均在一系列的土匪任务中效果很好，这些任务是从任务分布中取样的。受到学习到学习与学习线性回归的最新研究的启发，我们考虑了一类Bandit算法，该算法实现了众所周知的Oful算法的正则化版本，在该算法中，正则化是正规化的欧几里得距离与偏置向量的距离。我们首先从遗憾的最小化方面研究了偏见的算法的好处。然后，我们提出了两种策略来估计学习对学习环境中的偏见。我们在理论上还是在实验上都表明，当任务数量增加并且任务分布的差异很小时，我们的策略在孤立学习任务方面具有重要优势。

We investigate meta-learning procedures in the setting of stochastic linear bandits tasks. The goal is to select a learning algorithm which works well on average over a class of bandits tasks, that are sampled from a task-distribution. Inspired by recent work on learning-to-learn linear regression, we consider a class of bandit algorithms that implement a regularized version of the well-known OFUL algorithm, where the regularization is a square euclidean distance to a bias vector. We first study the benefit of the biased OFUL algorithm in terms of regret minimization. We then propose two strategies to estimate the bias within the learning-to-learn setting. We show both theoretically and experimentally, that when the number of tasks grows and the variance of the task-distribution is small, our strategies have a significant advantage over learning the tasks in isolation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题