纠缠的残留映射

论文标题

纠缠的残留映射

Entangled Residual Mappings

论文作者

Lechner, Mathias, Hasani, Ramin, Babaiee, Zahra, Grosu, Radu, Rus, Daniela, Henzinger, Thomas A., Hochreiter, Sepp

论文摘要

已显示残留映射可以在第一层进行表示，并在较高层中进行迭代特征细化。这种相互作用与它们对梯度规范的稳定作用相结合，使他们能够训练非常深的网络。在本文中，我们进一步迈出了一步，并引入了纠缠的残差映射，以概括残差连接的结构并评估其在迭代学习表示中的作用。一个纠缠的残差映射将Identity Skip Connection替换为具有独特属性（特征值，结构和Jacobian Norm）的专门纠缠映射，例如正交，稀疏和结构相关矩阵，并带有身份映射。我们表明，尽管纠缠映射可以保留各种深层模型中特征的迭代完善，但它们影响卷积网络中的表示学习过程的方式与基于注意力的模型和经常性神经网络不同。通常，我们发现，对于CNN和视觉变压器，纠缠稀疏映射可以帮助概括，而正交映射会损害性能。对于复发网络，正交残差映射形成了时间变化序列的电感偏差，该序列会降低时间不变的任务的准确性。

Residual mappings have been shown to perform representation learning in the first layers and iterative feature refinement in higher layers. This interplay, combined with their stabilizing effect on the gradient norms, enables them to train very deep networks. In this paper, we take a step further and introduce entangled residual mappings to generalize the structure of the residual connections and evaluate their role in iterative learning representations. An entangled residual mapping replaces the identity skip connections with specialized entangled mappings such as orthogonal, sparse, and structural correlation matrices that share key attributes (eigenvalues, structure, and Jacobian norm) with identity mappings. We show that while entangled mappings can preserve the iterative refinement of features across various deep models, they influence the representation learning process in convolutional networks differently than attention-based models and recurrent neural networks. In general, we find that for CNNs and Vision Transformers entangled sparse mapping can help generalization while orthogonal mappings hurt performance. For recurrent networks, orthogonal residual mappings form an inductive bias for time-variant sequences, which degrades accuracy on time-invariant tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题