使用Centroid-编码器降低和可视化的监督维度降低和可视化

论文标题

使用Centroid-编码器降低和可视化的监督维度降低和可视化

Supervised Dimensionality Reduction and Visualization using Centroid-encoder

论文作者

Ghosh, Tomojit, Kirby, Michael

论文摘要

可视化高维数据是数据科学和机器学习中的重要任务。质心编码器（CE）方法与自动编码器相似，但结合了标签信息，以使类的对象在减少的可视化空间中靠近。 CE利用非线性和标签以在捕获数据的全局结构的同时编码低维度的高方差。 We present a detailed analysis of the method using a wide variety of data sets and compare it with other supervised dimension reduction techniques, including NCA, nonlinear NCA, t-distributed NCA, t-distributed MCML, supervised UMAP, supervised PCA, Colored Maximum Variance Unfolding, supervised Isomap, Parametric Embedding, supervised Neighbor Retrieval Visualizer, and Multiple Relational Embedding.我们从经验上表明，Centroid-编码器的表现优于大多数这些技术。我们还表明，当数据方差分布在多种模态上时，质心编码器从低维空间中的数据中提取了大量信息。此关键功能确定了将其用作数据可视化工具的价值。

Visualizing high-dimensional data is an essential task in Data Science and Machine Learning. The Centroid-Encoder (CE) method is similar to the autoencoder but incorporates label information to keep objects of a class close together in the reduced visualization space. CE exploits nonlinearity and labels to encode high variance in low dimensions while capturing the global structure of the data. We present a detailed analysis of the method using a wide variety of data sets and compare it with other supervised dimension reduction techniques, including NCA, nonlinear NCA, t-distributed NCA, t-distributed MCML, supervised UMAP, supervised PCA, Colored Maximum Variance Unfolding, supervised Isomap, Parametric Embedding, supervised Neighbor Retrieval Visualizer, and Multiple Relational Embedding. We empirically show that centroid-encoder outperforms most of these techniques. We also show that when the data variance is spread across multiple modalities, centroid-encoder extracts a significant amount of information from the data in low dimensional space. This key feature establishes its value to use it as a tool for data visualization.

下载PDF全文

下载文献需遵守相关版权规定

论文标题