（组合）纯探索的汤普森采样

论文标题

（组合）纯探索的汤普森采样

Thompson Sampling for (Combinatorial) Pure Exploration

论文作者

Wang, Siwei, Zhu, Jun

论文摘要

现有组合纯探索的方法主要集中在UCB方法上。为了提高算法，他们通常使用ARM集合$ S $内的上限置信度范围的总和来表示$ S $的上限限制，这可能比$ S $的紧密上限限制大得多，并且会导致比必要的更高的复杂性，因为$ S $中不同手臂的经验手段是独立的。为了应对这一挑战，我们探索了使用独立的随机样品而不是上置信度边界的汤普森采样（TS）的概念，并为（组合）纯探索设计了第一个基于TS的算法TS-TS-explore。在TS-explore中，ARM集合$ S $中的独立随机样品的总和不会超过具有高概率的$ S $的紧密上限限制。因此，它解决了上述挑战，并且比一般组合纯探索中现有的有效UCB算法的复杂性更高。至于对经典多臂强盗的纯粹探索，我们表明TS探索实现了渐近最佳的复杂性上限。

Existing methods of combinatorial pure exploration mainly focus on the UCB approach. To make the algorithm efficient, they usually use the sum of upper confidence bounds within arm set $S$ to represent the upper confidence bound of $S$, which can be much larger than the tight upper confidence bound of $S$ and leads to a much higher complexity than necessary, since the empirical means of different arms in $S$ are independent. To deal with this challenge, we explore the idea of Thompson Sampling (TS) that uses independent random samples instead of the upper confidence bounds, and design the first TS-based algorithm TS-Explore for (combinatorial) pure exploration. In TS-Explore, the sum of independent random samples within arm set $S$ will not exceed the tight upper confidence bound of $S$ with high probability. Hence it solves the above challenge, and achieves a lower complexity upper bound than existing efficient UCB-based algorithms in general combinatorial pure exploration. As for pure exploration of classic multi-armed bandit, we show that TS-Explore achieves an asymptotically optimal complexity upper bound.

下载PDF全文

下载文献需遵守相关版权规定

论文标题