论文标题
用于流形学习的多目标遗传编程:平衡质量和维度
Multi-Objective Genetic Programming for Manifold Learning: Balancing Quality and Dimensionality
论文作者
论文摘要
随着数据的不断增长,多种学习技术变得越来越有价值。通过发现数据集结构的较低维表示(嵌入),多种学习算法可以大大降低数据集的维度,同时保留尽可能多的信息。但是,最新的多种学习算法在执行这种转变方面是不透明的。了解嵌入与原始高维空间相关的方式对于探索性数据分析至关重要。我们以前提出了一种基因编程方法,该方法通过不断透明且可解释的映射来进行多种学习。此方法需要先验嵌入的维度,这使得在数据集知之甚少时很难使用。在本文中,我们通过引入一种多目标方法来实质上扩展了以前的工作,该方法自动平衡了多种质量和维度的竞争目标。我们提出的方法具有一系列基线和最先进的多种学习方法的竞争力,同时还提供了解决方案的范围(前面),这些解决方案在质量和维度之间提供了不同的权衡。此外,学习模型通常是简单有效的,仅利用少量的特征以可解释的方式使用。
Manifold learning techniques have become increasingly valuable as data continues to grow in size. By discovering a lower-dimensional representation (embedding) of the structure of a dataset, manifold learning algorithms can substantially reduce the dimensionality of a dataset while preserving as much information as possible. However, state-of-the-art manifold learning algorithms are opaque in how they perform this transformation. Understanding the way in which the embedding relates to the original high-dimensional space is critical in exploratory data analysis. We previously proposed a Genetic Programming method that performed manifold learning by evolving mappings that are transparent and interpretable. This method required the dimensionality of the embedding to be known a priori, which makes it hard to use when little is known about a dataset. In this paper, we substantially extend our previous work, by introducing a multi-objective approach that automatically balances the competing objectives of manifold quality and dimensionality. Our proposed approach is competitive with a range of baseline and state-of-the-art manifold learning methods, while also providing a range (front) of solutions that give different trade-offs between quality and dimensionality. Furthermore, the learned models are shown to often be simple and efficient, utilising only a small number of features in an interpretable manner.