用于流形学习的多目标遗传编程：平衡质量和维度

论文标题

用于流形学习的多目标遗传编程：平衡质量和维度

Multi-Objective Genetic Programming for Manifold Learning: Balancing Quality and Dimensionality

论文作者

Lensen, Andrew, Zhang, Mengjie, Xue, Bing

论文摘要

随着数据的不断增长，多种学习技术变得越来越有价值。通过发现数据集结构的较低维表示（嵌入），多种学习算法可以大大降低数据集的维度，同时保留尽可能多的信息。但是，最新的多种学习算法在执行这种转变方面是不透明的。了解嵌入与原始高维空间相关的方式对于探索性数据分析至关重要。我们以前提出了一种基因编程方法，该方法通过不断透明且可解释的映射来进行多种学习。此方法需要先验嵌入的维度，这使得在数据集知之甚少时很难使用。在本文中，我们通过引入一种多目标方法来实质上扩展了以前的工作，该方法自动平衡了多种质量和维度的竞争目标。我们提出的方法具有一系列基线和最先进的多种学习方法的竞争力，同时还提供了解决方案的范围（前面），这些解决方案在质量和维度之间提供了不同的权衡。此外，学习模型通常是简单有效的，仅利用少量的特征以可解释的方式使用。

Manifold learning techniques have become increasingly valuable as data continues to grow in size. By discovering a lower-dimensional representation (embedding) of the structure of a dataset, manifold learning algorithms can substantially reduce the dimensionality of a dataset while preserving as much information as possible. However, state-of-the-art manifold learning algorithms are opaque in how they perform this transformation. Understanding the way in which the embedding relates to the original high-dimensional space is critical in exploratory data analysis. We previously proposed a Genetic Programming method that performed manifold learning by evolving mappings that are transparent and interpretable. This method required the dimensionality of the embedding to be known a priori, which makes it hard to use when little is known about a dataset. In this paper, we substantially extend our previous work, by introducing a multi-objective approach that automatically balances the competing objectives of manifold quality and dimensionality. Our proposed approach is competitive with a range of baseline and state-of-the-art manifold learning methods, while also providing a range (front) of solutions that give different trade-offs between quality and dimensionality. Furthermore, the learned models are shown to often be simple and efficient, utilising only a small number of features in an interpretable manner.

下载PDF全文

下载文献需遵守相关版权规定

论文标题