神经参数分配搜索

论文标题

神经参数分配搜索

Neural Parameter Allocation Search

论文作者

Plummer, Bryan A., Dryden, Nikoli, Frost, Julius, Hoefler, Torsten, Saenko, Kate

论文摘要

训练神经网络需要增加记忆量。参数共享可以降低内存和通信成本，但是现有方法假设网络具有许多相同的层，并且利用无法推广的手工制作的共享策略。我们介绍了神经参数分配搜索（NPA），这是一项新任务，目标是训练神经网络，并给定任意的固定参数预算。 NPA涵盖了产生紧凑网络的低预算制度以及一种新型的高预算制度，可以在不增加推理失败的情况下添加额外的能力来提高性能。要解决NPA，我们介绍了Shapeshifter网络（SSN），该网络会自动学习在网络中的何处以及如何共享参数以支持任何参数预算，而无需对体系结构或损失功能进行任何更改。 NPA和SSN提供了一个完整的框架，用于解决广义参数共享，也可以与先前的工作相结合以获得额外的性能提高。我们使用跨四个不同任务（包括ImageNet分类和变压器）的九个网络体系结构来证明方法的有效性。

Training neural networks requires increasing amounts of memory. Parameter sharing can reduce memory and communication costs, but existing methods assume networks have many identical layers and utilize hand-crafted sharing strategies that fail to generalize. We introduce Neural Parameter Allocation Search (NPAS), a novel task where the goal is to train a neural network given an arbitrary, fixed parameter budget. NPAS covers both low-budget regimes, which produce compact networks, as well as a novel high-budget regime, where additional capacity can be added to boost performance without increasing inference FLOPs. To address NPAS, we introduce Shapeshifter Networks (SSNs), which automatically learn where and how to share parameters in a network to support any parameter budget without requiring any changes to the architecture or loss function. NPAS and SSNs provide a complete framework for addressing generalized parameter sharing, and can also be combined with prior work for additional performance gains. We demonstrate the effectiveness of our approach using nine network architectures across four diverse tasks, including ImageNet classification and transformers.

下载PDF全文

下载文献需遵守相关版权规定

论文标题