在上下文匪徒中，在线半监督学习，并具有情节奖励

论文标题

在上下文匪徒中，在线半监督学习，并具有情节奖励

Online Semi-Supervised Learning in Contextual Bandits with Episodic Reward

论文作者

Lin, Baihan

论文摘要

我们考虑了一个新颖的在线学习的新颖实用问题，并以情节揭示的奖励，这是由几个现实世界应用的动机，在不同的情节中，上下文是非组织的，并且决策者并不总是可以提供奖励反馈。对于此在线半监督学习设置，我们引入了背景情节奖励linucb（Berlinucb），该解决方案很容易将聚类作为一个自学意义的模块，以便在未观察到奖励时提供有用的侧面信息。我们在六种不同场景的固定和非机构环境中进行的各种数据集的实验表明，所提出的方法比标准上下文匪徒具有明显的优势。最后，我们介绍了一个相关的现实生活示例，其中此问题设置特别有用。

We considered a novel practical problem of online learning with episodically revealed rewards, motivated by several real-world applications, where the contexts are nonstationary over different episodes and the reward feedbacks are not always available to the decision making agents. For this online semi-supervised learning setting, we introduced Background Episodic Reward LinUCB (BerlinUCB), a solution that easily incorporates clustering as a self-supervision module to provide useful side information when rewards are not observed. Our experiments on a variety of datasets, both in stationary and nonstationary environments of six different scenarios, demonstrated clear advantages of the proposed approach over the standard contextual bandit. Lastly, we introduced a relevant real-life example where this problem setting is especially useful.

下载PDF全文

下载文献需遵守相关版权规定

论文标题