通过信息最大化标准的自我监督学习

论文标题

通过信息最大化标准的自我监督学习

Self-Supervised Learning with an Information Maximization Criterion

论文作者

Ozsoy, Serdar, Hamdan, Shadi, Arik, Sercan Ö., Yuret, Deniz, Erdogan, Alper T.

论文摘要

自我监督的学习允许AI系统使用不需要昂贵的标签的任务从大量数据中学习有效表示。模式崩溃，即为所有输入产生相同表示形式的模型，是许多自我监管的学习方法的核心问题，可以使自我监督的任务（例如输入的扭曲变体匹配，无效）。在本文中，我们认为，同一输入的替代潜在表示之间信息最大化的直接应用自然解决了崩溃问题并实现了竞争性的经验结果。我们提出了一种自我监督的学习方法Corinfomax，该方法使用了基于二阶统计的共同信息度量，以反映其参数之间的相关水平。在同一输入的替代表示之间最大化此相关信息度量可以实现两个目的：（1）通过生成具有非脱位协方差的特征向量来避免崩溃问题；（2）通过增加它们之间的线性依赖性，它在替代表示之间建立了相关性。提出的信息最大化客观的近似简化为基于欧几里得的基于距离的目标函数，该目标函数由特征协方差矩阵的对数确定因素正规化。正则化项是针对特征空间退化的自然障碍。因此，除了避免完全输出崩溃到一个点外，提出的方法还通过鼓励信息在整个特征空间中的传播来防止尺寸崩溃。数值实验表明，相对于最先进的SSL方法，Corinfomax取得了更好或竞争性的性能结果。

Self-supervised learning allows AI systems to learn effective representations from large amounts of data using tasks that do not require costly labeling. Mode collapse, i.e., the model producing identical representations for all inputs, is a central problem to many self-supervised learning approaches, making self-supervised tasks, such as matching distorted variants of the inputs, ineffective. In this article, we argue that a straightforward application of information maximization among alternative latent representations of the same input naturally solves the collapse problem and achieves competitive empirical results. We propose a self-supervised learning method, CorInfoMax, that uses a second-order statistics-based mutual information measure that reflects the level of correlation among its arguments. Maximizing this correlative information measure between alternative representations of the same input serves two purposes: (1) it avoids the collapse problem by generating feature vectors with non-degenerate covariances; (2) it establishes relevance among alternative representations by increasing the linear dependence among them. An approximation of the proposed information maximization objective simplifies to a Euclidean distance-based objective function regularized by the log-determinant of the feature covariance matrix. The regularization term acts as a natural barrier against feature space degeneracy. Consequently, beyond avoiding complete output collapse to a single point, the proposed approach also prevents dimensional collapse by encouraging the spread of information across the whole feature space. Numerical experiments demonstrate that CorInfoMax achieves better or competitive performance results relative to the state-of-the-art SSL approaches.

下载PDF全文

下载文献需遵守相关版权规定

论文标题