论文标题
Dropit:下降中间张量用于记忆效率的DNN训练
DropIT: Dropping Intermediate Tensors for Memory-Efficient DNN Training
论文作者
论文摘要
训练深神经网络时的标准硬件瓶颈是GPU记忆。大部分内存是通过缓存中间张量来占据的,用于向后通过的梯度计算。我们提出了一种新的方法来减少这种足迹 - 下降中间张量(Dropit)。下降滴度中间张量的min-k元素,并近似从向后传递的稀疏张量的梯度近似。从理论上讲,Dropit会降低估计梯度的噪声,因此比Vanilla-SGD具有更高的收敛速度。实验表明,我们可以在完全连接和卷积层中最多降低90%的中间张量元素,同时在各种任务上(例如,分类,对象检测,实例分割)实现了视觉变压器和卷积神经网络的更高测试精度。我们的代码和型号可在https://github.com/chenjoya/dropit上找到。
A standard hardware bottleneck when training deep neural networks is GPU memory. The bulk of memory is occupied by caching intermediate tensors for gradient computation in the backward pass. We propose a novel method to reduce this footprint - Dropping Intermediate Tensors (DropIT). DropIT drops min-k elements of the intermediate tensors and approximates gradients from the sparsified tensors in the backward pass. Theoretically, DropIT reduces noise on estimated gradients and therefore has a higher rate of convergence than vanilla-SGD. Experiments show that we can drop up to 90\% of the intermediate tensor elements in fully-connected and convolutional layers while achieving higher testing accuracy for Visual Transformers and Convolutional Neural Networks on various tasks (e.g., classification, object detection, instance segmentation). Our code and models are available at https://github.com/chenjoya/dropit.