上下文信息指导的采样

论文标题

上下文信息指导的采样

Contextual Information-Directed Sampling

论文作者

Hao, Botao, Lattimore, Tor, Qin, Chao

论文摘要

信息指导的采样（IDS）最近证明了其作为数据有效的增强学习算法的潜力。但是，尚不清楚当可用上下文信息时，要优化的信息比的正确形式是什么。我们通过两个上下文强盗问题研究IDS设计：具有图形反馈和稀疏线性上下文匪徒的上下文强盗。我们证明了上下文ID比条件ID的优势，并强调考虑上下文分布的重要性。主要信息是，智能代理人应该在有条件的ID可能是近视的情况下对未来看不见的环境有益的行动进行更多的投资。我们进一步提出了基于Actor-Critic的上下文ID的计算效率版本，并在神经网络上下文的强盗上进行了经验评估。

Information-directed sampling (IDS) has recently demonstrated its potential as a data-efficient reinforcement learning algorithm. However, it is still unclear what is the right form of information ratio to optimize when contextual information is available. We investigate the IDS design through two contextual bandit problems: contextual bandits with graph feedback and sparse linear contextual bandits. We provably demonstrate the advantage of contextual IDS over conditional IDS and emphasize the importance of considering the context distribution. The main message is that an intelligent agent should invest more on the actions that are beneficial for the future unseen contexts while the conditional IDS can be myopic. We further propose a computationally-efficient version of contextual IDS based on Actor-Critic and evaluate it empirically on a neural network contextual bandit.

下载PDF全文

下载文献需遵守相关版权规定

论文标题