从理论到数字不平衡的最佳运输

论文标题

从理论到数字不平衡的最佳运输

Unbalanced Optimal Transport, from Theory to Numerics

论文作者

Séjourné, Thibault, Peyré, Gabriel, Vialard, François-Xavier

论文摘要

最佳运输（OT）最近已成为数据科学中的中心工具，以几何忠实的方式点云和更普遍的概率分布进行比较。但是，多个缺点困扰着OT对现有数据分析和机器学习管道的广泛采用。这包括缺乏对异常值的鲁棒性，高度计算成本，对高维度的大量样本的需求以及在不同空间中处理数据的困难。在这篇评论中，我们详细介绍了一些最近提出的减轻这些问题的方法。我们尤其坚持使用不平衡的OT，该OT比较了任意阳性度量，而不仅限于概率分布（即它们的总质量可能会有所不同）。 OT的这种概括使离群值和丢失的数据变得强大。现代计算OT的第二个主力是熵正则化，这导致可扩展算法，同时降低了高维度的样品复杂性。这篇评论中提出的最后一点是Gromov-Wasserstein（GW）距离，该距离扩展到OT至应对属于不同度量空间的分布。这篇综述的主要动机是解释OT，熵正则化和GW的不平衡如何使OT合作将OT转化为数据科学的有效几何损失函数。

Optimal Transport (OT) has recently emerged as a central tool in data sciences to compare in a geometrically faithful way point clouds and more generally probability distributions. The wide adoption of OT into existing data analysis and machine learning pipelines is however plagued by several shortcomings. This includes its lack of robustness to outliers, its high computational costs, the need for a large number of samples in high dimension and the difficulty to handle data in distinct spaces. In this review, we detail several recently proposed approaches to mitigate these issues. We insist in particular on unbalanced OT, which compares arbitrary positive measures, not restricted to probability distributions (i.e. their total mass can vary). This generalization of OT makes it robust to outliers and missing data. The second workhorse of modern computational OT is entropic regularization, which leads to scalable algorithms while lowering the sample complexity in high dimension. The last point presented in this review is the Gromov-Wasserstein (GW) distance, which extends OT to cope with distributions belonging to different metric spaces. The main motivation for this review is to explain how unbalanced OT, entropic regularization and GW can work hand-in-hand to turn OT into efficient geometric loss functions for data sciences.

下载PDF全文

下载文献需遵守相关版权规定

论文标题