用于表示和半监督学习的参数UMAP嵌入

论文标题

用于表示和半监督学习的参数UMAP嵌入

Parametric UMAP embeddings for representation and semi-supervised learning

论文作者

Sainburg, Tim, McInnes, Leland, Gentner, Timothy Q

论文摘要

UMAP是一种使用应用的Riemannian几何形状和代数拓扑结构的非参数基于图的尺寸降低算法，可找到结构化数据的低维嵌入。 UMAP算法由两个步骤组成：（1）计算数据集的图形表示（模糊的简单复合物），以及（2）通过随机梯度下降，优化图形的低维嵌入。在这里，我们将UMAP的第二步扩展到对神经网络权重的参数优化，学习数据和嵌入之间的参数关系。我们首先证明，参数UMAP与其非参数对应物的性能相当，同时赋予了学习的参数映射的好处（例如，快速的在线嵌入新数据）。然后，我们将UMAP作为正规化探索，从而限制了自动编码器的潜在分布，参数变化的全球结构保存以及通过在未标记数据中捕获结构来提高半监视学习的分类器精度。 Google COLAB演练：https：//colab.research.google.com/drive/1wkxvz5pnmrm17m0ygmtonjm_xhdne5vp?usp = sharing

UMAP is a non-parametric graph-based dimensionality reduction algorithm using applied Riemannian geometry and algebraic topology to find low-dimensional embeddings of structured data. The UMAP algorithm consists of two steps: (1) Compute a graphical representation of a dataset (fuzzy simplicial complex), and (2) Through stochastic gradient descent, optimize a low-dimensional embedding of the graph. Here, we extend the second step of UMAP to a parametric optimization over neural network weights, learning a parametric relationship between data and embedding. We first demonstrate that Parametric UMAP performs comparably to its non-parametric counterpart while conferring the benefit of a learned parametric mapping (e.g. fast online embeddings for new data). We then explore UMAP as a regularization, constraining the latent distribution of autoencoders, parametrically varying global structure preservation, and improving classifier accuracy for semi-supervised learning by capturing structure in unlabeled data. Google Colab walkthrough: https://colab.research.google.com/drive/1WkXVZ5pnMrm17m0YgmtoNjM_XHdnE5Vp?usp=sharing

下载PDF全文

下载文献需遵守相关版权规定

论文标题