论文标题
探索深度神经网络的低级培训
Exploring Low Rank Training of Deep Neural Networks
论文作者
论文摘要
培训低级的深度神经网络,即使用分解层,这是社区特别感兴趣的:它在记忆消耗和训练时间方面都提供了对未分离培训的效率。先前的工作集中于预训练的网络的低级近似值和低级空间中的培训,并具有其他目标,为所选实践提供了各种临时解释。我们分析了在实践中运作良好的技术,并通过对诸如GPT2之类的模型进行大量消融,我们提供了证据表明该领域的共同信念,这暗示了令人兴奋的研究机会仍然需要回答。
Training deep neural networks in low rank, i.e. with factorised layers, is of particular interest to the community: it offers efficiency over unfactorised training in terms of both memory consumption and training time. Prior work has focused on low rank approximations of pre-trained networks and training in low rank space with additional objectives, offering various ad hoc explanations for chosen practice. We analyse techniques that work well in practice, and through extensive ablations on models such as GPT2 we provide evidence falsifying common beliefs in the field, hinting in the process at exciting research opportunities that still need answering.