在存在外源性非组织变异的情况下进行自适应实验

论文标题

在存在外源性非组织变异的情况下进行自适应实验

Adaptive Experimentation in the Presence of Exogenous Nonstationary Variation

论文作者

Qin, Chao, Russo, Daniel

论文摘要

我们研究旨在选择用于人口部署的治疗组的实验。多臂匪徒算法可以通过根据观察到的反馈来动态分配测量工作来提高效率。但是，这种动态可能会导致面对实验过程中武器表现的非组织外源性因素的脆弱行为。为了解决这个问题，我们提出了反污染的汤普森采样（DTS），这是汤普森著名采样算法的更强大变体。随着观察结果的积累，DTS在控制观察到的治疗决策的情况下投射了ARM的人口水平的性能。这里的上下文可能会捕获可理解的变化来源，例如治疗个体的国家，或者简单地记录治疗时间。我们为DTS内部和实验后的遗憾提供了界限，这说明了其对外源变异的韧性及其在探索和剥削之间的微妙平衡。我们的证明利用反向倾向权重分析后验分布的演变，这与文献中既定的方法背道而驰。我们暗示确实有必要新的理解，我们表明，流行的上限限制算法的变形变体可能会完全失败。

We investigate experiments that are designed to select a treatment arm for population deployment. Multi-armed bandit algorithms can enhance efficiency by dynamically allocating measurement effort towards higher performing arms based on observed feedback. However, such dynamics can result in brittle behavior in the face of nonstationary exogenous factors influencing arms' performance during the experiment. To counter this, we propose deconfounded Thompson sampling (DTS), a more robust variant of the prominent Thompson sampling algorithm. As observations accumulate, DTS projects the population-level performance of an arm while controlling for the context within which observed treatment decisions were made. Contexts here might capture a comprehensible source of variation, such as the country of a treated individual, or simply record the time of treatment. We provide bounds on both within-experiment and post-experiment regret of DTS, illustrating its resilience to exogenous variation and the delicate balance it strikes between exploration and exploitation. Our proofs leverage inverse propensity weights to analyze the evolution of the posterior distribution, a departure from established methods in the literature. Hinting that new understanding is indeed necessary, we show that a deconfounded variant of the popular upper confidence bound algorithm can fail completely.

下载PDF全文

下载文献需遵守相关版权规定

论文标题