双线性汤普森对上下文指令的抽样

论文标题

双线性汤普森对上下文指令的抽样

Double-Linear Thompson Sampling for Context-Attentive Bandits

论文作者

Bouneffouf, Djallel, Féraud, Raphaël, Upadhyay, Sohini, Khazaeni, Yasaman, Rish, Irina

论文摘要

在本文中，我们分析并扩展了一个在线学习框架，称为上下文是由各种实际应用，从医学诊断到对话系统的动机，在这种情况下，只有一小部分可能在每种迭代的迭代中观察到一小部分潜在的上下文变量的子集；但是，该代理可以自由地选择哪些变量来观察哪些变量。我们得出了一种新颖的算法，称为“上下文”汤普森采样（CATS），该算法以线性汤普森采样方法为基础，使其适应了上下文进行上下文训练的强盗设置。我们提供了理论遗憾分析和广泛的经验评估，证明了拟议方法比各种现实生活数据集的几种基线方法的优势

In this paper, we analyze and extend an online learning framework known as Context-Attentive Bandit, motivated by various practical applications, from medical diagnosis to dialog systems, where due to observation costs only a small subset of a potentially large number of context variables can be observed at each iteration;however, the agent has a freedom to choose which variables to observe. We derive a novel algorithm, called Context-Attentive Thompson Sampling (CATS), which builds upon the Linear Thompson Sampling approach, adapting it to Context-Attentive Bandit setting. We provide a theoretical regret analysis and an extensive empirical evaluation demonstrating advantages of the proposed approach over several baseline methods on a variety of real-life datasets

下载PDF全文

下载文献需遵守相关版权规定

论文标题