在线低级矩阵完成

论文标题

在线低级矩阵完成

Online Low Rank Matrix Completion

论文作者

Jain, Prateek, Pal, Soumyabrata

论文摘要

我们研究了{\ em Online}的问题，其中$ \ mathsf {m} $用户，$ \ mathsf {n} $项目和$ \ Mathsf {t} $ rounds的问题。在每个回合中，该算法建议每个用户一个项目，为此获得（嘈杂的）奖励从低级别的用户项目偏好矩阵采样。目标是设计一种具有子线遗憾的方法（以$ \ Mathsf {t} $），几乎最佳地依赖于$ \ Mathsf {M} $和$ \ Mathsf {n} $。该问题可以很容易地映射到标准的多臂强盗问题，其中每个项目都是一个{\ em Independent} ARM，但由于没有利用武器和用户之间的相关性，这会导致遗憾。另一方面，由于低级别的歧管的非凸度性，利用奖励矩阵的低排列结构是具有挑战性的。我们首先证明，可以使用简单的探索（etc）方法来利用低级结构，从而确保了$ O（\ Mathsf {polylog}（\ Mathsf {M Mathsf {M}+\ Mathsf {n}）\ Mathsf {n}）\ Mathsf {t}^{2/3}）$。也就是说，大约只有$ \ mathsf {polylog}（\ mathsf {m}+\ \ \ \ \ \ \ mathsf {n}）$项目建议是每个用户获得非平地解决方案的。然后，我们改善了排名$ 1 $设置的结果，这本身就很具有挑战性，并封装了一些关键问题。在这里，我们提出\ textsc {八倍}（使用迭代用户群集在线进行过滤），保证$ O（\ Mathsf {polylog}（\ Mathsf {M}+\ Mathsf {M Mathsf {n} n}）\ Mathsf {n}） Octal是基于一种新型技术，该技术允许迭代消除项目并导致几乎最佳的最小值速率。

We study the problem of {\em online} low-rank matrix completion with $\mathsf{M}$ users, $\mathsf{N}$ items and $\mathsf{T}$ rounds. In each round, the algorithm recommends one item per user, for which it gets a (noisy) reward sampled from a low-rank user-item preference matrix. The goal is to design a method with sub-linear regret (in $\mathsf{T}$) and nearly optimal dependence on $\mathsf{M}$ and $\mathsf{N}$. The problem can be easily mapped to the standard multi-armed bandit problem where each item is an {\em independent} arm, but that leads to poor regret as the correlation between arms and users is not exploited. On the other hand, exploiting the low-rank structure of reward matrix is challenging due to non-convexity of the low-rank manifold. We first demonstrate that the low-rank structure can be exploited using a simple explore-then-commit (ETC) approach that ensures a regret of $O(\mathsf{polylog} (\mathsf{M}+\mathsf{N}) \mathsf{T}^{2/3})$. That is, roughly only $\mathsf{polylog} (\mathsf{M}+\mathsf{N})$ item recommendations are required per user to get a non-trivial solution. We then improve our result for the rank-$1$ setting which in itself is quite challenging and encapsulates some of the key issues. Here, we propose \textsc{OCTAL} (Online Collaborative filTering using iterAtive user cLustering) that guarantees nearly optimal regret of $O(\mathsf{polylog} (\mathsf{M}+\mathsf{N}) \mathsf{T}^{1/2})$. OCTAL is based on a novel technique of clustering users that allows iterative elimination of items and leads to a nearly optimal minimax rate.

下载PDF全文

下载文献需遵守相关版权规定

论文标题