论文标题

从$ t $ -sne到UMAP进行对比学习

From $t$-SNE to UMAP with contrastive learning

论文作者

Damrich, Sebastian, Böhm, Jan Niklas, Hamprecht, Fred A., Kobak, Dmitry

论文摘要

邻居嵌入方法$ t $ -sne和UMAP是可视化高维数据集的事实上的标准。从完全不同的观点中,它们的损失功能似乎是无关的。在实践中,它们产生的嵌入很大,可以提出对相同数据的矛盾解释。这样做的基本原因,更普遍地,$ t $ -sne和UMAP之间的确切关系仍不清楚。在这项工作中,我们通过对对比的学习方法的新见解来揭示他们的概念联系。噪声对抗性估计可用于优化$ t $ -SNE,而UMAP依赖于负抽样,另一种对比方法。我们发现这两种对比方法之间的确切关系,并提供了负抽样引入的失真表征。从视觉上看,这种失真会导致UMAP产生更紧凑的嵌入,而簇较紧,而$ t $ -sne。我们利用这种新的概念连接来提出和实施负面抽样的概括,使我们能够在$ t $ -sne和UMAP及其各自的嵌入之间插入(甚至超出)$ t $ -sne。沿着这种嵌入的范围移​​动会导致离散 /局部和连续 /全球结构之间的权衡,从而减轻了任何单个嵌入的表面性特征过度解释的风险。我们提供了Pytorch实施。

Neighbor embedding methods $t$-SNE and UMAP are the de facto standard for visualizing high-dimensional datasets. Motivated from entirely different viewpoints, their loss functions appear to be unrelated. In practice, they yield strongly differing embeddings and can suggest conflicting interpretations of the same data. The fundamental reasons for this and, more generally, the exact relationship between $t$-SNE and UMAP have remained unclear. In this work, we uncover their conceptual connection via a new insight into contrastive learning methods. Noise-contrastive estimation can be used to optimize $t$-SNE, while UMAP relies on negative sampling, another contrastive method. We find the precise relationship between these two contrastive methods and provide a mathematical characterization of the distortion introduced by negative sampling. Visually, this distortion results in UMAP generating more compact embeddings with tighter clusters compared to $t$-SNE. We exploit this new conceptual connection to propose and implement a generalization of negative sampling, allowing us to interpolate between (and even extrapolate beyond) $t$-SNE and UMAP and their respective embeddings. Moving along this spectrum of embeddings leads to a trade-off between discrete / local and continuous / global structures, mitigating the risk of over-interpreting ostensible features of any single embedding. We provide a PyTorch implementation.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源