论文标题
只需选择一个标志:使用梯度标志辍学的深度多任务模型优化
Just Pick a Sign: Optimizing Deep Multitask Models with Gradient Sign Dropout
论文作者
论文摘要
绝大多数深层模型都使用多个梯度信号,通常与多个损失条款的总和相对应,以更新一组共享的可训练权重。但是,这些多个更新可以通过将模型置于冲突的方向上来阻碍最佳培训。我们提出了梯度符号辍学(GradDrop),这是一种概率掩蔽过程,该过程根据其一致性水平在激活层中采样梯度。 GradDrop被实现为简单的深层,可以在任何深网中使用,并与其他梯度平衡方法协同作用。我们表明,GradDrop的表现优于传统多任务和转移学习设置中最先进的多旋转方法,我们讨论了GradDrop如何揭示最佳的多洛斯训练与梯度随机性之间的联系。
The vast majority of deep models use multiple gradient signals, typically corresponding to a sum of multiple loss terms, to update a shared set of trainable weights. However, these multiple updates can impede optimal training by pulling the model in conflicting directions. We present Gradient Sign Dropout (GradDrop), a probabilistic masking procedure which samples gradients at an activation layer based on their level of consistency. GradDrop is implemented as a simple deep layer that can be used in any deep net and synergizes with other gradient balancing approaches. We show that GradDrop outperforms the state-of-the-art multiloss methods within traditional multitask and transfer learning settings, and we discuss how GradDrop reveals links between optimal multiloss training and gradient stochasticity.