论文标题
深度学习的分层模式的出现
Emergence of hierarchical modes from deep learning
论文作者
论文摘要
大规模的深神经网络会消耗昂贵的培训成本,但是培训导致构建网络的重量矩阵较小。在这里,我们提出了一种模式分解学习,可以将重量矩阵解释为潜在模式的层次结构。这些模式类似于记忆网络物理研究中的模式,但是最少的模式仅随着网络宽度而对数增加,并且当宽度进一步增长时,它甚至会变得恒定。模式分解学习不仅节省了大量的培训成本,而且还用领先模式来解释网络性能,从而显示出惊人的分段幂律行为。这些模式指定了整个网络层次结构的逐渐紧凑的潜在空间,与标准培训相比,逐渐分离的子空间。我们的模式分解学习还在分析的在线学习环境中进行了研究,该学习环境揭示了学习动力学的多阶段,并连续地专业化了隐藏节点。因此,提出的模式分解学习指向了廉价且可解释的途径,可以朝着神奇的深度学习方向发展。
Large-scale deep neural networks consume expensive training costs, but the training results in less-interpretable weight matrices constructing the networks. Here, we propose a mode decomposition learning that can interpret the weight matrices as a hierarchy of latent modes. These modes are akin to patterns in physics studies of memory networks, but the least number of modes increases only logarithmically with the network width, and becomes even a constant when the width further grows. The mode decomposition learning not only saves a significant large amount of training costs, but also explains the network performance with the leading modes, displaying a striking piecewise power-law behavior. The modes specify a progressively compact latent space across the network hierarchy, making a more disentangled subspaces compared to standard training. Our mode decomposition learning is also studied in an analytic on-line learning setting, which reveals multi-stage of learning dynamics with a continuous specialization of hidden nodes. Therefore, the proposed mode decomposition learning points to a cheap and interpretable route towards the magical deep learning.