论文标题

通过低级别表示,强大而快速地衡量信息

Robust and Fast Measure of Information via Low-rank Representation

论文作者

Dong, Yuxin, Gong, Tieliang, Yu, Shujian, Chen, Hong, Li, Chen

论文摘要

基于矩阵的Rényi的熵使我们能够直接从给定数据中量化信息度量,而无需明确估计潜在的概率分布。这种有趣的属性使其广泛应用于统计推理和机器学习任务。但是,此信息的理论数量对数据中的噪声并不强大,并且在大规模应用中的计算范围很高。为了解决这些问题,我们提出了一种新的信息衡量标准,称为基于低级基质的Rényi的熵,基于无限划分的内核矩阵的低级别表示。提出的熵功能继承了原始定义的专业,以直接从数据中量化信息,但享有额外的优势,包括鲁棒性和有效的计算。具体而言,我们的低级别变体对基础分布的变化引起的信息扰动更敏感,同时对噪音引起的非信息性扰动不敏感。此外,可以通过随机预测和兰开斯迭代技术有效地近似较低的Rényi的熵,从而将整体复杂性从$ \ Mathcal {o}(n^3)$降低到$ \ Mathcal {o}(o}(n^2 s)$,甚至$ \ \ \ \ \ \ \ \ \ ns $ n $ n $ n $ n $ s $, \ ll n $。我们进行大规模实验来评估这项新信息度量的有效性,与基于矩阵的Rényi的熵相比,在性能和计算效率方面相比,结果优越。

The matrix-based Rényi's entropy allows us to directly quantify information measures from given data, without explicit estimation of the underlying probability distribution. This intriguing property makes it widely applied in statistical inference and machine learning tasks. However, this information theoretical quantity is not robust against noise in the data, and is computationally prohibitive in large-scale applications. To address these issues, we propose a novel measure of information, termed low-rank matrix-based Rényi's entropy, based on low-rank representations of infinitely divisible kernel matrices. The proposed entropy functional inherits the specialty of of the original definition to directly quantify information from data, but enjoys additional advantages including robustness and effective calculation. Specifically, our low-rank variant is more sensitive to informative perturbations induced by changes in underlying distributions, while being insensitive to uninformative ones caused by noises. Moreover, low-rank Rényi's entropy can be efficiently approximated by random projection and Lanczos iteration techniques, reducing the overall complexity from $\mathcal{O}(n^3)$ to $\mathcal{O}(n^2 s)$ or even $\mathcal{O}(ns^2)$, where $n$ is the number of data samples and $s \ll n$. We conduct large-scale experiments to evaluate the effectiveness of this new information measure, demonstrating superior results compared to matrix-based Rényi's entropy in terms of both performance and computational efficiency.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源