论文标题
从高维噪声数据中学习低维的非线性结构:积分操作员方法
Learning Low-Dimensional Nonlinear Structures from High-Dimensional Noisy Data: An Integral Operator Approach
论文作者
论文摘要
我们提出了一种从高维和嘈杂的观测值中学习低维的非线性结构的内核 - 元素嵌入算法,其中假定数据集是从本质上低维的多种歧管中取样的,并被高维噪声损坏。该算法采用自适应带宽选择程序,该过程不依赖于基础歧管的先验知识。可以进一步用于下游目的,例如数据可视化,聚类和预测。我们的方法在理论上是合理的,实际上可以解释。具体而言,当样本的尺寸和大小相当大时,我们建立了最终嵌入到它们的无噪声的收敛性,并且表征了信噪比对收敛速率和相变率的影响。我们还证明了嵌入的嵌入到由某些再现核希尔伯特空间的内核图定义的积分操作员的本征函数上,从而捕获了基本的非线性结构。与许多现有方法相比,在学习各种应用中的各种歧管时,对三个真实数据集的数值仿真和分析表明,该方法的出色经验性能。
We propose a kernel-spectral embedding algorithm for learning low-dimensional nonlinear structures from high-dimensional and noisy observations, where the datasets are assumed to be sampled from an intrinsically low-dimensional manifold and corrupted by high-dimensional noise. The algorithm employs an adaptive bandwidth selection procedure which does not rely on prior knowledge of the underlying manifold. The obtained low-dimensional embeddings can be further utilized for downstream purposes such as data visualization, clustering and prediction. Our method is theoretically justified and practically interpretable. Specifically, we establish the convergence of the final embeddings to their noiseless counterparts when the dimension and size of the samples are comparably large, and characterize the effect of the signal-to-noise ratio on the rate of convergence and phase transition. We also prove convergence of the embeddings to the eigenfunctions of an integral operator defined by the kernel map of some reproducing kernel Hilbert space capturing the underlying nonlinear structures. Numerical simulations and analysis of three real datasets show the superior empirical performance of the proposed method, compared to many existing methods, on learning various manifolds in diverse applications.