论文标题

在线低级矩阵完成

Online Low Rank Matrix Completion

论文作者

Jain, Prateek, Pal, Soumyabrata

论文摘要

我们研究了{\ em Online}的问题,其中$ \ mathsf {m} $用户,$ \ mathsf {n} $项目和$ \ Mathsf {t} $ rounds的问题。在每个回合中,该算法建议每个用户一个项目,为此获得(嘈杂的)奖励从低级别的用户项目偏好矩阵采样。目标是设计一种具有子线遗憾的方法(以$ \ Mathsf {t} $),几乎最佳地依赖于$ \ Mathsf {M} $和$ \ Mathsf {n} $。该问题可以很容易地映射到标准的多臂强盗问题,其中每个项目都是一个{\ em Independent} ARM,但由于没有利用武器和用户之间的相关性,这会导致遗憾。另一方面,由于低级别的歧管的非凸度性,利用奖励矩阵的低排列结构是具有挑战性的。我们首先证明,可以使用简单的探索(etc)方法来利用低级结构,从而确保了$ O(\ Mathsf {polylog}(\ Mathsf {M Mathsf {M}+\ Mathsf {n})\ Mathsf {n})\ Mathsf {t}^{2/3})$。也就是说,大约只有$ \ mathsf {polylog}(\ mathsf {m}+\ \ \ \ \ \ \ mathsf {n})$项目建议是每个用户获得非平地解决方案的。然后,我们改善了排名$ 1 $设置的结果,这本身就很具有挑战性,并封装了一些关键问题。在这里,我们提出\ textsc {八倍}(使用迭代用户群集在线进行过滤),保证$ O(\ Mathsf {polylog}(\ Mathsf {M}+\ Mathsf {M Mathsf {n} n})\ Mathsf {n}) Octal是基于一种新型技术,该技术允许迭代消除项目并导致几乎最佳的最小值速率。

We study the problem of {\em online} low-rank matrix completion with $\mathsf{M}$ users, $\mathsf{N}$ items and $\mathsf{T}$ rounds. In each round, the algorithm recommends one item per user, for which it gets a (noisy) reward sampled from a low-rank user-item preference matrix. The goal is to design a method with sub-linear regret (in $\mathsf{T}$) and nearly optimal dependence on $\mathsf{M}$ and $\mathsf{N}$. The problem can be easily mapped to the standard multi-armed bandit problem where each item is an {\em independent} arm, but that leads to poor regret as the correlation between arms and users is not exploited. On the other hand, exploiting the low-rank structure of reward matrix is challenging due to non-convexity of the low-rank manifold. We first demonstrate that the low-rank structure can be exploited using a simple explore-then-commit (ETC) approach that ensures a regret of $O(\mathsf{polylog} (\mathsf{M}+\mathsf{N}) \mathsf{T}^{2/3})$. That is, roughly only $\mathsf{polylog} (\mathsf{M}+\mathsf{N})$ item recommendations are required per user to get a non-trivial solution. We then improve our result for the rank-$1$ setting which in itself is quite challenging and encapsulates some of the key issues. Here, we propose \textsc{OCTAL} (Online Collaborative filTering using iterAtive user cLustering) that guarantees nearly optimal regret of $O(\mathsf{polylog} (\mathsf{M}+\mathsf{N}) \mathsf{T}^{1/2})$. OCTAL is based on a novel technique of clustering users that allows iterative elimination of items and leads to a nearly optimal minimax rate.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源