具有深度代表和浅探索的神经背景匪徒

论文标题

具有深度代表和浅探索的神经背景匪徒

Neural Contextual Bandits with Deep Representation and Shallow Exploration

论文作者

Xu, Pan, Wen, Zheng, Zhao, Handong, Gu, Quanquan

论文摘要

我们研究一类的上下文匪徒，其中每个上下文对向量都与原始特征向量相关联，但是奖励生成函数尚不清楚。我们提出了一种新颖的学习算法，该算法使用深层神经网络（深度表示学习）的最后一个隐藏层转换原始特征向量，并使用上一个线性层（浅层探索）中使用上置信度结合（UCB）方法来探索。我们证明，根据标准假设，我们提出的算法实现了$ \ tilde {o}（\ sqrt {t}）$有限的遗憾，其中$ t $是学习时间范围。与现有的神经上下文匪徒算法相比，我们的方法在计算上的效率要高得多，因为它只需要在深神经网络的最后一层中进行探索。

We study a general class of contextual bandits, where each context-action pair is associated with a raw feature vector, but the reward generating function is unknown. We propose a novel learning algorithm that transforms the raw feature vector using the last hidden layer of a deep ReLU neural network (deep representation learning), and uses an upper confidence bound (UCB) approach to explore in the last linear layer (shallow exploration). We prove that under standard assumptions, our proposed algorithm achieves $\tilde{O}(\sqrt{T})$ finite-time regret, where $T$ is the learning time horizon. Compared with existing neural contextual bandit algorithms, our approach is computationally much more efficient since it only needs to explore in the last layer of the deep neural network.

下载PDF全文

下载文献需遵守相关版权规定

论文标题