Riemannian块坐标下降方法，用于计算投影稳健的Wasserstein距离

论文标题

Riemannian块坐标下降方法，用于计算投影稳健的Wasserstein距离

A Riemannian Block Coordinate Descent Method for Computing the Projection Robust Wasserstein Distance

论文作者

Huang, Minhui, Ma, Shiqian, Lai, Lifeng

论文摘要

Wasserstein距离在机器学习和深度学习中变得越来越重要。尽管它很受欢迎，但由于维度的诅咒，Wasserstein距离很难近似。一种减轻维度诅咒的最近提出的方法是将来自高维概率分布的采样数据投射到较低维的子空间上，然后计算投影数据之间的Wasserstein距离。但是，这种方法需要在Stiefel歧管上解决最大的问题，这在实践中非常具有挑战性。直接解决此问题的唯一现有工作是RGA（带有Sink Horn迭代的Riemannian梯度上升）算法，该算法需要在每次迭代中解决熵调节的最佳运输问题，因此对于大规模问题而言可能是昂贵的。在本文中，我们提出了一种riemannian块坐标下降（RBCD）方法来解决此问题，该方法基于对Stiefel歧管上正则化最大问题问题的新型重新印象。我们表明，RBCD获得$ε$ - 定位点的算术操作的复杂性为$ O（ε^{ - 3}）$。这显着提高了RGA的相应复杂性，即$ O（ε^{ - 12}）$。此外，我们的RBCD具有非常低的均值复杂性，因此适用于大规模问题。合成数据集和真实数据集的数值结果表明，我们的方法比现有方法更有效，尤其是当采样数据的数量很大时。

The Wasserstein distance has become increasingly important in machine learning and deep learning. Despite its popularity, the Wasserstein distance is hard to approximate because of the curse of dimensionality. A recently proposed approach to alleviate the curse of dimensionality is to project the sampled data from the high dimensional probability distribution onto a lower-dimensional subspace, and then compute the Wasserstein distance between the projected data. However, this approach requires to solve a max-min problem over the Stiefel manifold, which is very challenging in practice. The only existing work that solves this problem directly is the RGAS (Riemannian Gradient Ascent with Sinkhorn Iteration) algorithm, which requires to solve an entropy-regularized optimal transport problem in each iteration, and thus can be costly for large-scale problems. In this paper, we propose a Riemannian block coordinate descent (RBCD) method to solve this problem, which is based on a novel reformulation of the regularized max-min problem over the Stiefel manifold. We show that the complexity of arithmetic operations for RBCD to obtain an $ε$-stationary point is $O(ε^{-3})$. This significantly improves the corresponding complexity of RGAS, which is $O(ε^{-12})$. Moreover, our RBCD has very low per-iteration complexity, and hence is suitable for large-scale problems. Numerical results on both synthetic and real datasets demonstrate that our method is more efficient than existing methods, especially when the number of sampled data is very large.

下载PDF全文

下载文献需遵守相关版权规定

论文标题