最佳聚类与强盗反馈

论文标题

最佳聚类与强盗反馈

Optimal Clustering with Bandit Feedback

论文作者

Yang, Junwen, Zhong, Zixin, Tan, Vincent Y. F.

论文摘要

本文考虑了用匪徒反馈在线聚类的问题。一组武器（或项目）可以分为未知的各个组。在每个组中，与每个臂相关的观测值遵循相同的平均向量的相同分布。在每个时间步骤中，代理查询或拉动手臂并从与之相关的分布中获得独立的观察。随后的拉值取决于先前的拉力以及先前获得的样品。代理商的任务是揭示手臂拉动数量最少的武器的基本分区，并且误差的可能性不超过规定的常数$δ$。提出的问题发现了许多应用程序，从病毒的变体聚类到在线市场细分。我们在此任务的预期样本复杂性上提出了一个与实例有关的信息理论下限，并设计了一种计算上有效且渐近的最佳算法，即强盗在线聚类（BOC）。该算法包括一种用于自适应顺序测试的新型停止规则，该规则规避了需要精确解决任何NP加权聚类问题作为其子例程的必要性。我们通过对合成和现实世界数据集的广泛模拟显示，BOC的性能与渐近下限匹配，并且显着优于非自适应基线算法。

This paper considers the problem of online clustering with bandit feedback. A set of arms (or items) can be partitioned into various groups that are unknown. Within each group, the observations associated to each of the arms follow the same distribution with the same mean vector. At each time step, the agent queries or pulls an arm and obtains an independent observation from the distribution it is associated to. Subsequent pulls depend on previous ones as well as the previously obtained samples. The agent's task is to uncover the underlying partition of the arms with the least number of arm pulls and with a probability of error not exceeding a prescribed constant $δ$. The problem proposed finds numerous applications from clustering of variants of viruses to online market segmentation. We present an instance-dependent information-theoretic lower bound on the expected sample complexity for this task, and design a computationally efficient and asymptotically optimal algorithm, namely Bandit Online Clustering (BOC). The algorithm includes a novel stopping rule for adaptive sequential testing that circumvents the need to exactly solve any NP-hard weighted clustering problem as its subroutines. We show through extensive simulations on synthetic and real-world datasets that BOC's performance matches the lower bound asymptotically, and significantly outperforms a non-adaptive baseline algorithm.

下载PDF全文

下载文献需遵守相关版权规定

论文标题