了解非线性在对比度学习的训练动力学中的作用

论文标题

了解非线性在对比度学习的训练动力学中的作用

Understanding the Role of Nonlinearity in Training Dynamics of Contrastive Learning

论文作者

Tian, Yuandong

论文摘要

尽管自我监督学习（SSL）的经验成功在很大程度上依赖于深度非线性模型的使用，但SSL理解的现有理论著作仍然集中在线性方面。在本文中，我们研究了非线性在对比度学习的训练动力学（CL）上的作用，并在具有均匀激活的一层非线性网络上$ h（x）= h'（x）x $。我们有两个主要的理论发现。首先，即使在1层设置中，非线性的存在也可能导致许多局部优势，每种设置对应于数据分布中的某些模式，而通过线性激活，只能学习一个主要模式。这表明，具有很多参数的模型可以被视为\ emph {brute-force}的方法，以找到由非线性引起的这些局部优点。其次，在2层情况下，线性激活被证明无法将专业权重学习为各种模式，这表明了非线性的重要性。此外，对于2层设置，我们还发现\ emph {全局调制}：从全局级别模式的角度来看，这些局部模式被优先考虑学习，以进一步表征学习过程。仿真验证我们的理论发现。

While the empirical success of self-supervised learning (SSL) heavily relies on the usage of deep nonlinear models, existing theoretical works on SSL understanding still focus on linear ones. In this paper, we study the role of nonlinearity in the training dynamics of contrastive learning (CL) on one and two-layer nonlinear networks with homogeneous activation $h(x) = h'(x)x$. We have two major theoretical discoveries. First, the presence of nonlinearity can lead to many local optima even in 1-layer setting, each corresponding to certain patterns from the data distribution, while with linear activation, only one major pattern can be learned. This suggests that models with lots of parameters can be regarded as a \emph{brute-force} way to find these local optima induced by nonlinearity. Second, in the 2-layer case, linear activation is proven not capable of learning specialized weights into diverse patterns, demonstrating the importance of nonlinearity. In addition, for 2-layer setting, we also discover \emph{global modulation}: those local patterns discriminative from the perspective of global-level patterns are prioritized to learn, further characterizing the learning process. Simulation verifies our theoretical findings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题