论文标题
与损坏的上下文的在线学习:损坏的上下文土匪
Online learning with Corrupted context: Corrupted Contextual Bandits
论文作者
论文摘要
我们考虑了上下文匪徒问题的新颖变体(即,具有侧面信息或上下文的多军匪徒可用于决策者),其中每个决策中使用的上下文可能会被损坏(“无用的上下文”)。某些在线环境(包括临床试验和AD建议申请)的促进了这个新问题。为了解决损坏的封闭式设置,我们建议将标准上下文强盗方法与经典的多军匪徒机制相结合。与标准的上下文匪徒方法不同,我们能够通过改善每个ARM的期望计算来从所有迭代中学习,即使是具有损坏的上下文的迭代。在几个现实生活数据集中获得了有希望的经验结果。
We consider a novel variant of the contextual bandit problem (i.e., the multi-armed bandit with side-information, or context, available to a decision-maker) where the context used at each decision may be corrupted ("useless context"). This new problem is motivated by certain on-line settings including clinical trial and ad recommendation applications. In order to address the corrupted-context setting,we propose to combine the standard contextual bandit approach with a classical multi-armed bandit mechanism. Unlike standard contextual bandit methods, we are able to learn from all iteration, even those with corrupted context, by improving the computing of the expectation for each arm. Promising empirical results are obtained on several real-life datasets.