重新访问子空间集群的数据增强

论文标题

重新访问子空间集群的数据增强

Revisiting data augmentation for subspace clustering

论文作者

Abdolali, Maryam, Gillis, Nicolas

论文摘要

子空间聚类是将大约位于几个低维子空间围绕的数据样本集合集合的经典问题。此问题的当前最新方法基于自我表达模型，该模型代表样品作为其他样品的线性组合。但是，这些方法需要足够良好的样本才能准确表示，这在许多应用中可能不一定可以访问。在本文中，我们阐明了这个常见的问题，并认为每个子空间中的数据分布在自我表达模型的成功中起着至关重要的作用。我们提出的解决此问题的解决方案是由数据增强在深神经网络的泛化能力中的核心作用引起的。我们为无监督和半监督的设置提出了两个子空间聚类框架，这些框架使用增强样品作为扩大词典来提高自我表达表示的质量。我们提出了一种使用一些标记的样品进行半监督问题的自动增强策略，依赖于数据样本位于多个线性子空间的结合以下事实。实验结果证实了数据增强的有效性，因为它显着提高了一般自我表达模型的性能。

Subspace clustering is the classical problem of clustering a collection of data samples that approximately lie around several low-dimensional subspaces. The current state-of-the-art approaches for this problem are based on the self-expressive model which represents the samples as linear combination of other samples. However, these approaches require sufficiently well-spread samples for accurate representation which might not be necessarily accessible in many applications. In this paper, we shed light on this commonly neglected issue and argue that data distribution within each subspace plays a critical role in the success of self-expressive models. Our proposed solution to tackle this issue is motivated by the central role of data augmentation in the generalization power of deep neural networks. We propose two subspace clustering frameworks for both unsupervised and semi-supervised settings that use augmented samples as an enlarged dictionary to improve the quality of the self-expressive representation. We present an automatic augmentation strategy using a few labeled samples for the semi-supervised problem relying on the fact that the data samples lie in the union of multiple linear subspaces. Experimental results confirm the effectiveness of data augmentation, as it significantly improves the performance of general self-expressive models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题